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CHAPTER I: 

INTRODUCTION AND OVERVIEW 



In June, 1964 the Center for the Study of Medical Education at 
the University of Illinois College of Medicine embarked on a four- 
year joint study with the American Board of Orthopaedic Surgery in 
an attempt to develop improved methods of assessing competence in 
the field of orthopaedic surgery. The study was supported by the 
Bureau of Stgte Services of the Public Health Service as one 
approach to obtaining better utilization of health manpower by 
means of increased flexibility and efficiency in the training of 
health professionals. 

The following' were deemed essential to the accomplishment 
of this goal: 

1. Precise identification of the components of professional 
competence in the field 

2. Development of valid and reliable technique of assessing 
these components of competence 

3. Identification of variations in the patterns of competence 
and differential rates in their achievement associated with 
variations in training programs 

Consequently, the first stage of the investigation was devoted 
to a critical incident study- l of the essential performance requirements 
for orthopaedic surgeons. The resulting definition of competence 
included 9 major categories (such as "Skill in Gathering Clinical 
Information," "Competence in Developing a Diagnosis," and "Effectiveness 
of Physician-Patient Relationship") and 94 subcategories of behavior?* 
This definition of competence served to direct all subsequent stages 
of the study. 

At the same time that the critical incident study was being 
conducted a task force of orthopaedic surgeons under the direction 
of the Center staff analyzed the behaviors sampled by the written 
examinations currently in use by the Board and concluded that 
most questions required primarily the abiliry to recall isolated 
fragments of information. Analogous observational study of the 
oral examinations yielded similar results. 

The next stage of the study was therefore devoted to the 
improvement of conventional techniques and to the development 

* See Chapter II and Appendix 1. 



2 



of new ones 
identified 
therefore, 
exercises. 



designed to assess the other components of competence 
in the critical incident study. During this period 
in addition to improving the quality of multiple choice 
the following new techniques were developed: 



1. Written simulation exercises 

2 . .Oral simulation exercises utilizing 'role-playing 

i techniques 

3. Oral exercises testing complex cognitive abilities 

4 . Rating forms for evaluating habitual performance 

5. Rating forms for evaluating specific abilities in 
single observations 

* 

To study the validity and reliability of these techniques 
several forms of these examinations were developed and admin- 
istered to the following populations of examinees: 

1. A final orthopaedic certification examination (OCE) was 
administered to 4 populations composed of candidates 
for certification who had completed their residency 
and were currently in orthopaedic practice; 

2 . A prerequisite certifying examination (OCE-I) was 
administered to 2 populations composed of candidates 
for certification who were in their last year of 
residency training; 

3 . An In-Training Examination (ITE) was administered by the 
American Academy of Orthopaedic Surgeons, for diagnostic 
purposes to 3 populations composed of virtually all 
residents .currently in training; 

4 . The written simulation exercises from one of the final 
orthopaedic certification examinations (OCE) was ad- 
ministered for experimental purposes to all Board 
examiners . 



The following forms of the above listed examinations were 
analyzed to obtain reliability and validity data on both the 
new and the more conventional techniques; 



The January 1965, 1965, 1967 and 1968 final Orthopaedic 
Certification Examinar.ions ; the May 1965 and 1966 prerequisite 
Certif ication Examinations — 1; and the November 1965, 1966 and 

1967 In-Training Examinations. 
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The reliability of the written forms was assessed by analysis 
of internal consistency; the reliability of the orals was assessed 
by analysis of inter-rater agreement and by correlation of alternate 
forms; the reliability of ratings of habitual performance was 
assessed by analysis of interrater agreement. 

« 

The content validity of the examinations was assessed by 
process analysis. 3 Their construct validity was assessed by 
testing hypotheses regarding the relationship between performance 
on the examination and such factors as age, experience, and 
practice settings; additional data on construct validity was 
obtained from various correlational and factor analytic studies. 
Concurrent validity of the tests was assessed by correlational 
and multiple regression analysis of the relationships between 
test scores and supervisor's ratings. Predictive validity of 
the examinations is to be assessed in a 10-year follow-up study.* 

4 

The detailed results of these analyses are discussed in 
Chapters IV through X below; however, the initial findings can 
be briefly summarized as follows: 

1. Numerous skills and abilities are requisite to competence 
in orthopaedic surgery and the correlation between certain 
of them is relatively low, e.g., "Surgical Skill" versus 
"Ability to Relate to Patients." 

2 . Each of the new evaluation techniques (as listed earlier) 
appears to measure certain independent aspects of 
competence not assessed by other techniques. 

3 . These newer and more complex techniques tend to be less 
reliable than conventional "objective "__(ie . , multiple 

choice) techniques primarily because fewer inde- 

pendent samples of behavior can be obtained in a given 
time; however, they are considerably more reliable 
than techniques which depend upon ratings (either of 
habitual behavior or of single incidents), and their 
reliability can be increased by pooling response 

data from a number of techniques and from repeti- 
tions of one technique to arrive at a composite score. 

4 . The concurrent validity of this compos i te score appears 
to be substantially higher than that of scores obtained 
from conventional techniques of testing and of pooling 
response data. 

* See Chapter XII and Appendix 30 
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On the basis of these results the study staff recommended restruc- 
turing the entire certification procedure. These recommendations and 
the supporting data were reviewed by a group of special Task Forces 
appointed by the Board. The consequent modifications in the certi- 
fication procedures, as adopted by the Board, were first fully 
implemented in the January, 1968 Final Certification Examination. 

Table 1 summarizes the differences between that examination and the 
one administered in 1964, the year just priur to the initiation of 
the Orthopaedic Training Study. 

* 

With the elaboration of a behavioral definition of competence in 
orthopaedic surgery, the development of more valid and reliable tech- 
niques for assessing these competencies and the incorporation of these 
techniques in an integrated certification .system, it has now become 
possible to identify more precisely the relationship between variations 
in training programs and differential raters and patterns of achievement. 
The present study, therefore, serves both as one type of model for 
professional self study and as the indispensable prerequisite for a 
further study of methods of increasing the efficiency and effectiveness 
of training in this specialty. The first section of the present study 
report is devoted to a discussion of the rationale and findings .of the 
prior analyses required in the development of new methods of evaluating 
professional competence in this specialty; the second section contains 
a description of each new assessment technique developed during this 
study, the methods employed in analyzing the validity and reliability 
of each, and the findings from each such analysis; the third section 
summarizes the methods used in implementing new certification proce-' 
dures based on the research findings, and the outlined plans for 
subsequent study. 




1 John C. Flanagan, ”The Critical Incident Technique,” Psycholo g ical 
Bulletin. July, 1954. Vol. 51, No. 4, pp. 327-358. 

2 J. Michael Blum and Robert Fitzpatrick, Critical Perf orm ance 
Requirements for Orthopaedic Surgery. (2 volumes) Pittsburgh, Pa. 
American Institutes for Research, 1965. 

3 Christine McGuire, ”A Process Approach to the Construction and 
Analysis of Medical Examinations,” The Journal of Medical 
Education, Vol. 38, No. 7, July, 1963. ~ 
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TABLE 1: TWO SYSTEMS OF BOARD CERTIFICATION 



Aspect 



January 19 64 



January 1968 



Requirements Completion of an approved 

for residency 

eligibility 

Completion of two years of 
practice or its equivalent 



Same as 1964 



Completion of one year of 
practice on its equivalent 



Satisfactory completion of Requirement eliminated 
Part I examination taken at 
end of residency 



Letters of recommendation Same as 1964 plus submission 
from chief of service and of standardized Candidate 
current colleagues Rating Form by training chiefs 



L Method of 
preparation 
Wtf exam- 
ination 



Various subject matter parts 
of the written assigned to 
different members of the 
Examination Committee for 
development; All materials 
reviewed by entire Committee 
in 2-3 day meeting. 




Detailed set of specifications 
with respect to "content "• 

and process, established by the 
Examination Committee and 
approved by Board; Prepara- 
tion of materials (both 
written and oral) to meet 
these specifications assigned 
to various task forces; Preli- 
minary form of written test 
administered to entire Board; 
data from this initial try out 
reviewed by Examination 



Committee as basis for composing 
test in final form; All materials 
reviewed by Examination 
Committee 
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TABLE 1: (Cont'd) 



Aspect. 



Janua ry 1964 



January 1968 



Standards Oral ; 75 or better on 

for every oral 

Certification 

Written : Not lower than 

one standard deviation 
below the mean of a sub- 
group composed of all- 
. ' graduates of U.S. 

medical schools who had 

no previous failures on 
the Board Examination. 



Achievement of the pre- 
established "minimum passing 
level "on the weighted total 
score and on the Recall and 
Problem- So3.ving factors; and 
achievement of ’Vnarginal 
level" on at least 2 of the 
4 factors including either 
the Recall OR Problem- 
Solving factor. 



Feedback 


Scores reported as Pass- 


Scores reported as 


overall 


to 


Fail on each technique: 


Pass-Fail together 


with a 


Candidates 


written or oral and, 


report of deficiency on 




within the orals , on 
each discipline. 


any factor. 


• 



Training of Informal induction in an 

Examiners apprentice- like system 

with general guidelines 
explained by subject 
advisers in an evening 
pre-session and general 
postmortem by panel 
advisers in an evening 
post session. 



Formal 1-2 day workshops on 
test construction for authors 
of written test . 

Formal 1-2 day training 
sessions on administering 
and scoring oral examinations . 



Scoring of Separate scores (based on 
Examination scale of 100) derived for 

the written and each oral. 



One or more of the following 
sub-scores derived from each 
test: Recall, Observation 

and Interpretation, Problem- 
solving, Ability to Commu- 
nicate with patients and 
colleagues; these sub- scores 
converted to a common 12- point 
scale and combined across 
tests to yield an overall 
score on each factor named 
above and a weighted total 
score on all factors combined. 
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TABLE 1 : (Cont'd) 



Aspect 



January 1964 



January 1968 



Method of 
preparation 
of exami- 
nation 



Orals in specific subjects 
developed by individual 
examiners in accord with 
general guidelines, no 
prior review of these 
materials 



Standardized cases and 
related materials for orals 
prepared by task forces and 
reviewed by the Examination 
Committee 



Method of 
Examination 




2 hour multiple choice plus 2 hour multiple choice test 

plus 1 hour written simulated 
2 1/2 hour oral consisting of patient management problems 
1/2 hour oral quiz in each of 



5 subject fields: adult, 
children’s, trauma, anatomy 
and pathology; variously 
designed questions and case 
materials individually pre- 
pared and selected by 
examiner: . 



3 half-hour orals on patient 
management problems in adult , 
children's and trauma, 
utilizing standardized cases 
administered by trained 
examiners, plus 1 half-hou,. 
oral on interpretation of 
X-rays and histologic 
materials, using standar- 
dized cases administered 
by trained examiners, plus 1 
half-hour oral simulating 
various physician-patient 
and physician-colleague 
encounters, utilizing 
standardized case materials 
administered by trained 
examiners . 



Most questions in the oral Most questions in both orals 
and written designed to and written designed to assess 

assess recall of skills other than recall 

information 
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CHAPTER IX 

DEFINITION OF PROFESSIONAL COMPETENCE: 
THE CRITICAL INCIDENT STUDY 



Obtaining a meaningful statement of behavioral objectives in a 
form suitable for the direction of medical education has often 
proved difficult because those developed by subject matter specialists 
are typically so vague (e.g., to produce physicians who are good at 
critical thinking) as to provide little guidance to those respon- 
sible for program planning. Secondly, the objective sought by one 
instructor may not be shared by his colleagues and there is little 
basis for choice between differing views. . 'What is needed, therefore,, 

is a list of objectives which are specific enough to use as a guide 

* 

in developing instructional programs and evaluation instruments, 
and which are at the same time general enough to be acceptable to 
all those responsible for the educational program. One method of 
defining objectives which meets these criteria is that developed 
during World War II by Flanagan and his associates in an attempt 
to improve the efficiency of pilot training. Very briefly this 
approach, known as the "Critical Incident Technique, " consists 
in collecting descriptions of several thousand specific incidents 
involving effective or ineffective performance by individuals in 
training. These incidents are reviewed and classified in empirically 
derived categories that describe the essential element of behavior 
that seems to account for the effective or ineffective performance. * 
In the present study over 1700 such incidents involving effective 
and ineffective performance of orthopaedic surgeons were collected 
from the almost 3,000 members of the specialty contacted during 
the first year of the study. The number and sources of the 
incidents collected are shown in Table 2. These were classified 
into the types of categories listed in Exhibit I. (For a complete 
list of all sub-categories see Appendix 1) 

The types of incidents collected and the nature of the cate- 
gories derived from them are illustrated in Exhibit II. ^ From the 
examples given it is obvious that the specific incidents can be 
used to define the components of competence in behavioral terms. 

The critical incident technique therefore provides the optimum 
in specificity. Furthermore, since the categories are obtained 
empirically, the technique provides a basis for consensus 
concerning those aspects of professional competence that should 
be evaluated. Finally, it makes explicit the categories which 
qualified experts who are broadly representative of the specialty 
actually use to make value judgments about performance. 
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Exhibit: I: 

Major Categories and Illustrative Sub ’-Categories Defining 
Critical Performance Requirements for Orthopaedic 
Surgery 

* 

I, Skill in Gathering Clinical Information 

A. Eliciting Historical Information 
•B. Obtaining Information by Physical Examination 

C. Etc... 

II. Effectiveness in Using Special Diagnostic Methods 

A. Obtaining and Interpreting X-Rays 

B. Obtaining Additional Information by Other Means 

C. Etc... 

III.. Competence in Developing a Diagnosis 

A. Approaching Diagnosis Objectively 

B. Recognizing Condition 

C. Etc... , 

IV. Judgment in Deciding on Appropriate Care 

A. Adapting Treatment to the Individual Case 

B. Determining Extent and Immediacy of Therapy Needs 

C. Etc ... 

V. Judgment and Skill in Implementing Treatment 
, A. Planning the Operation 

B. Making Necessary Preparations for Operating 

C. Modifying Operative Plans According to Situation 

D. Etc ... 

VI. Effectiveness in Treating Emergency Patients 

A. Handling Patient 

B. Performing Emergency Treatment 

C. Etc... 

VII. Competence in Providing Continuing Care 

A. Attention Post-Operatively 

B. Monitoring Patient’s Progress 

C. Etc... 

VIII, Effectiveness of Physician-Patient Relationship 

A. Showing Concern and Consideration 

B. Relieving Anxiety of Patient and Family 

C. Etc ... 

IX. Accepting Responsibility for Welfare of Patient 

A. Accepting Responsibility for Welfare of Patient 

B. Recognizing Professional Capabilities and 
Limitations 

C. Relating Effectively to Other Medical Persons 

D. Etc... 










IV. 



VII 



,P 



[mrm'iij 

Judgment in Deciding on Appropriate Care 

D. Modifying operative plans according to situation 

2. Improvising with implements and materials. The orthopaedists 
can use materials in makeshift or innovative fashion. 



t 


EFFECTIVE 


l 

Situation : 


During an open reduction of a compound fractured 
radius, part of the bone was so comminuted that 
it could not be reapproximated so as to restore 
length. 


Experience : 


This was a Board certified physician. 


Action : 

* 


Took graft from pelvis and shaped it to resemble 
the missing bone. Placed Rush Rod down center 
of graft and replaced it in forearm. 


Why 

Effective : 


This idea was original, not preconceived 
and answered the problem at hand. 


Less 

Effective : 


Accept reduction. 



Effectiveness of Physician- Patient Relationship 
B. Relieving anxiety of Patient and Family 

treatment, pr oposals or complication. 
The orthopaedist- informs the patient and/or family, in terms 
which they can understand, of the progress of therapy. 



. Situation: 
Experience : 

Action: 



Why 

Ineffective 



INEFFECTIVE 

Child with Legg-Perthe ' s disease. 

This was a Board certified physician with approxi- 
mately three years of post-residency practice. 

Failed to discuss thoroughly with the parents the 
type of treatment being given and the reasons for it. 
Parents misunderstood completely the function of the 
brace and the build-up on the opposite shoe, and 
thought that the surgeon was treating the wrong 
leg and sued him for malpractice. 

Surgeon should be careful to explain thoroughly his 
treatment and the reasons for it wherever possible. 



O 
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The technique does, however , have certain weaknesses which 
should be noted. In general, it can be reasonably stated that the 
wain function of professional education is to prepare people to 
perform a certain role in society. If one uses the critical incident 
technique to define this role , one risks two types of errors ; First , 
members. of a profession may have a narrower view of the role of a 
professional than do his clients or colleagues. Since patients, other 
physicians and paramedical personnel are also concerned with the roles 
played by orthopaedists, a critical incident study that included these 
groups might uncover areas of competence which the orthopaedists ignore. 
This is an important . criticism and is particularly serious if the 
sample from whom incidents are collected is either very small or 
unusually homogenous. The second potential source of errors is 
attributable to possible bias on the part of the observer who records 
the incident. While this problem is mitigated by the fact that 
thousands. of incidents are collected from hundreds of individuals 
the technique does not eliminate group biases characteristic of 
an entire profession. 

* t 

In addition to these obvious sources of error it should be 
observed that there are a number of problems in developing behavioral 
objectives which are not solved by the critical incident technique. 
First,. it provides no guidance regarding priorities since the number ( 
of incidents recorded in any category reflects the incidence of the 
— behavior not the importance of the behavior. _(See Table 3) " 

'' 1 11 '•■ ■ ■ ■■nVi'ui nnt ih 'm 1 »■ ^ ■ lat - , ,, „ - . .. ■ 

TABLE 3 

Category NUMBER OF INCIDENTS REPORTED 

1. Skill in Gathering M^Wm^tion 

2. Effectiveness in Using Special Diagnostic Methods 

3. Competence in Developing a Diagnosis 

4. Judgment in Deciding on Appropriate Care 

5. Judgment and. Skill in Implementing Treatment 

6. Effectiveness in Treating Emergency Patients 

7 . Competence in Providing Continuing Care 

8. Effectiveness of Physician-Patient Relationship 
9, Accepting Respons ibilities of a Physician 

In. this study this limitation of the technique was obviated by 
the decision that competence in orthopaedics is multi-dimensional 
and that candidates should therefore meet minimally standards in all 
behavioral categories; excellence in one area could not compensate 
for deficiency in another. Thus it was unnecessary to decide whether, 
for example, surgical skill is more important than diagnostic ability; 
a competent orthopaedist must meet minimal satisfactory standards 
in each area of competency. However, this decision, in itself, 
created other practical problems since it was necessary to reduce 
the 94 categories of behavior derived from the critical incident study 
into some manageable number for purposes of professional assessment. 



Number 

59 

60 
109 
416 
297 

72 

84 

125 

523 



^ m iay m m" 1 gggsag 



The categories oJT performance that were finally chosen were logical 
but arbitrary groupings and distillations of the original 94. It 
might be that a different team of evaluators would have developed 
a different set of categories. As finally agreed upon by the 
Board of Orthopaedic Surgery and the research staff they consisted 
in the following groupings: (1) recall of basic information; (2) 

observation and interpretation of relevant data; (3) skill in 
problem-solving; and (4) ability to relate effectively to patients 
and colleagues, ( 5 ) surgical skill and (6) mor al and ethical qualities. 
These six components of competence, as defined and specified by 
the critical incident study of performance in orthopaedic surgery, 
have served to direct all subsequent steps in the research project. 




f) 



1 John C. Flanagan, "The Critical Incident Technique," Psychological 
Bulletin. July, 1954. Vol. 51, No. 4, pp. 327-358. 

2 J. Michael Blum and Robert Fitzpatrick, Critical Perfor mance 
Requirements for Orthopaedic Surgery , (2 volumes) Pittsburgh, Pa. 
American Institutes for Research, 1965. 
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CHAPTER III 

ANALYSIS OF THE TECHNIQUES 
CURRENTLY EMPLOYED 

I 

Once behavioral objectives are developed, it is necessary to 
review existing evaluation instruments to determine how well they 
sample the critical behaviors. This task is not easy because most 
test exercises are abstractions which sample elements of what test 
constructors believe are important prerequisites to effective 
behavior. Thus if one wishes to evaluate a chemist, one ordinarily 
does not observe him in his laboratory since such assessments are 
usually both impractical and unreliable, but instead one develops 
a test which, in theory, samples the elements of effective perfor- 
mance as a chemist'. 

* 

It is generally recognized that two assumptions are involved in 
assessments of this kind. The first assumption is that some exercises 
measure the recall of information which is a necessary, but not 
sufficient, requirement for effective performance. This assumption 
is rarely tested but there is a great deal of evidence that it may 
be dubious, first, because many practitioners of a profession do 
not depend wholly upon their memories but use handbooks and reference 
works and, second, because some information demanded by many tests is 
so esoteric that it is doubtful if the information is needed for any 
conceivable purpose. 

The second assumption is that some of the exercises require 
behavior which closely imitates that required for effective per- 
formance. Thus an accountant may be given an exercise which closely 
approximates reality. The testing of this assumption, is, therefore, 
an important aspect in assaying the effectiveness of any examination. 

It was the necessity to test these assumptions that led Bloom 
and his associates to develop the system for analyzing and classi- 
fying test exercises described in the Taxonomy of Educational 
Objectives Handbook I, The Cognitive Domain? This taxonomy is based 
on the premise that some test exercises demand only the recall of 
isolated bits of information while others require the examinee 
to demonstrate his ability to apply information to the solution of 
problems. The levels of intellectual process therein outlined are 
generalized to apply, in principle, to all educational levels and 

are not specific to medical education. For this reason, ajnodif ica- 

tion of the Bloom taxonomy, developed by the Committee on Student 
Appraisal of the University of Illinois College of Medicine and 
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adapted to medical education (See Appendix 2) was employed to analyze 
both the written and oral examinations prepared by the American Board 
of Orthopaedic Surgery. 

EXHIBIT III: A Taxonomy of Intellectual Processes^ 

Recall and recognition of information 

Select i on o f a relevant gen e ralization to 
explain specific phenomena 

P roblem solving of a familiar type requiring 
simple interpretation of data or the appli- 
cation of a single principle or a standard 
combination of ' principles to a situation of 
a familiar type 

Problem solving of an unfamiliar type requiring 
analysis of data or the application of a unique 
combination of princip3.es to solve a problem 
of a novel type 

Evaluation of a total situation 

Synthesis of a variety of elements of knowledge 
into an original and meaning whole 

Analysis of Written Examination 

Process Analysis 

Utilizing the hierarchically ordered classification system 
shown in Exhibit III a Task Force consisting of four orthopaedic 
surgeons and two test specialists met and independently rated 
each question in the January 1964 Orthopaedic Certification 
Examination and the May 1964 Orthopaedic Certification Examination 
Part I according to the highest intellectual process which the 
"typical" candidate would need to employ in responding to the 
422 questions comprizing these two examinations.* As shown in 
Table 4 there was complete agreement: among all four raters on 
about- half the items and substantial disagreement on about 
one-fourth. Further analysis revealed that the disagreements 
were attributable to the following factors : 



Level 1: 

I 

Level 2: 

Level 3: 

♦ 

4 

Level 4: 

Level 5: 
Level 6: 



See Appendix 3 for working papers and instructions to this Task Force. 



1 . 



Most of the disagreements occurred in the first set 
of items which were independently classified by these 
experts, at a point when they were as yet unfamiliar 
with the general approach and the specific system of 
classification . 

i’ 2. The hierarchical nature of the system of classification 
• created problems; some raters were inclined to rate 
questions according to the predominan t process involved, 
rather than, as previously agreed, according to the 
highest level required. 

3. Many items were part of a series all of which were based 
on data presented in the form of a clinical situation. 
Some task force members initially adopted the practice 
of classifying all items in such a group at the highest 
level required at any point in the analysis of the 
situation, whereas other raters followed the practice 

of according this very high rating to only one or two 
4 questions (and the specific items so classified varied 

from expert to expert) on the ground that once these 
questions bad been answered other items in the group 
(e.g. about therapy, next diagnostic steps etc.) in 
involved only recall or generalization about the 
disease entity described. 

4. On other items there was either uncertainty or difference 
of opinion about the nature of the experience most 
candidates would have had with a specific type of 
clinical problem described and hence about the process 
the "typical" candidate would need to employ. 

5 . Finally some items were classified differently by the 
several members of the Task Force because the formu- 
lation of the question or the alternatives presented 
difficulties of interpretation that "were artifacteof ’ the „ 
wording and not inherent in the question being posed. 

Disagreements arising from the first three sources noted above 
and some arising from the fourth were readily resolved in the group 
sessions following the independent rating of the first 233 questions. 
Principles of classification evolved in these discussions clarified 
the categories and appeared to reduce potential disagreements in 
subsequent classifications. 



TABLE 4: RATER AGREEMENT IN 1964 

PROCESS ANALYSIS OF MULTIPLE CHOICE QUESTIONS 



1U 



o 




* Among four raters on Part II and three on Part 
- # Not applicable 
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IS 




600 249 26 ^, 249 57 1416 100.1 14 ■ 100.1 



s.u 



The final results, summarized in Table 5, 



indicate that: 



1. There was substantial agreement among raters on 
75% of the 422 items reviewed; 



2 . Over half the items were unanimously believed to 
require only recall of information; 

( 1 3. Fewer than 25% of the items were thought by any 
• expert to involve even simple interpretat ion of 
data, application of principles or evaluation; 

4 . Only four of the items were thought by any rater 
to involve evaluation of a total situation- 
s' No item was thought to require synthesis. 

> * 

Statistical Analysis 



« 

The results of the process analysis reported above were further sub- 
stantiated by subsequent factor analytic studies of the 1966 and 1968 
final Orthopaedic Certification Examinations. In addition to conventional 
written and oral components, these examinations included new assessment 
techniques deliberately designed to evaluate abilities not adequately 
assessed by the traditional methods. If the rationale used in analyzing 
the old techniques and developing the now is correct then it is logical 
•to predict that the old and new techniques would load on different 
factors. As reported in Table 6 and 7 such is indeed the case: Both 

the 1966 and 1968 examinations show a similar factor pattern in which 
the conventional techniques have high loadings on one factor and most of 
the new techniques have high loadings on other factors. 



Data on the concurrent validity of the several techniques also sup- 
ports the conclusions based on the process analysis of the written exami- 
nations. Repeatedly such studies indicate that the score on the multiple 
choice component of’ any orthopaedic examination is the best predictor of 
supervisors' ratings of residents on such factors as "recall of factual 
information" or "ability to gather information, " but that it is less 
valid than other techniques as a predictor of their ratings on such 
factors as "problem-solving skill." (The detailed discussion of the 
construct and concurrent validity of the multiple choice technique is 
included in Chapter VIII) 



In summary, there is little doubt that the multiple choice technique 
as it was employed by the American Board of Orthopaedic Surgery prior to 
the current study measured mainly the recall of information and that other 
techniques were needed to assess the wide ranee of abilities included in 
O the definition of competence derived from the critical incident study. 
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TABLE 6 

FACTOR LOADINGS ON THREE ROTATED COMMON FACTORS 
OBTAINED BY PRINCIPAL COMPONENTS ANALYSIS 
OF FOURTEEN SUBSCORES ON THE 

JANUARY 1966 FINAL CERTIFICATION EXAMINATION 



Cumulative Percent 
of Total Variance 



21, 32 43 



Cumulative Percent 
of Common Variance 



48 75 100 



Scores 



Multiple Choice 
Short Answer 

Written Simulations 



Commonality 


I 


II 


III 


.55 


.70 


.25 


.09 


.43 


.65 


.13 


-.03 



Problem I Diagnostic 

Proficiency (Laboratory) 
Problem II Treatment 
Proficiency 
Problem III 

Diagnostic Proficiency 
(Historical Physical) 
Diagnostic Proficiency 
(Laboratory) 

Treatment -Proficiency 

Conventional Orals 

Pathology 
Children's 
Anatomy and Trauma 
Adult 

Simulated Patient Interv iews 

Diagnostic Interview Overall 
Proposed Treatment Interview 
Overall 

Simulated Patient Management 
Conference Overall 



.39 


.14 


.13 


.10 


.23 


.21 


.15 


.40 


.46 


-.04 


.65 


-.18 


.63 


.13 


.79 


i 

. 

o 

4 > 


.12 


.20 


.08 


.27 



39 


.58 


-.14 


-.20 


39 


.59 


.15 


.16 


37 


.60 


.10 


.03 


44 


.61 


.12 


-.21 



72 


.33 


.06 


-.78 


69 


.42 


.01 


-.71 


23 


.47 


-.11 


.00 
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TABLE 7 



OBTAINED 



FACTOR LOADINGS ON FIVE 
BY PRINCIPAL COMPONENTS 



ROTATED COMMON FACTORS 
ANALYSIS OF 14 SUB SCORES ON THE 



JANUARY .1963 FINAL ORTHOPAEDIC CERTIFICATION EXAMINATION 



i 





Cumulative Percent 
of Total Variance 




17 


31 

i 


41 


52 


62 


Cumulative Percent 
Scores of Common Variance 




27 

i 


50 


67 


84 


100 


Communal if; y 


Rating Factors 




I 


II 


III 


IV 


V 


Information Gathering 


.89 


.90 


.17 


.05 


.09 


.00 


Problem Solving 


.84 


.88 


.21 


,07 


.10 


.03 


Patient Relationships 

i 4 


.71 


.83 


-.01 


.09 


.07 


-.06 


Multiple Choice 


Recall 


.63 


.13 


.76 


.17 


-.00 


.12 


. Problem Solving 


.54 


.10 


.72 


.10 


-.07 


.04 


Oral Tests 


Trauma- Problem Solving 


.44 


.09 


.56 


- . 04 


.38 


-.05 


• Adult- Problem Solving 


.41 


.07 


.62 


-.11 


.34 


-.11 


Child- Problem Solving 
Observation and • 


.56 


.04 


.02 


.21 


.71 


-.01 

* */ 


Interpretation- Interpret at ion 


.41 


.19 


.17 


.41 


.36 


.24 


Simula t:\ons--Attitudes 
Written Simulation Exercises 
Diagnostic: Select 


.57 


.12 


.11 


-.09 


.73 


.01 


Indicated Procedures 


.77 


.07 


.28 


.65 


-.01 


-.51 


Diagnostic: Avoid 


..... . Contra-indicated 
Treatment: Select 


. -73 


.02 


-.14 


-.12 


-.11 


.83 


•> Indicated Procedures 


.73 


.08 


-.04 


.85 


.04 


-.00 


Treatment : Avoid 


Contra-indicated 


.52 


-.04 


.29 


'.09 


.13 


.64 






o 
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Exam inn t ions 

nvoi Pri °y t .°. the currGnl study, certification procedures included an ( 
Sl nT '? composed of five half-hour segments, one each in 

Anatoim^ * 1CS > Children's Orthopaedics, Trauma, Pathology and 
Anatomy. Each segment was administered and scored by a team of two 

an^^upplv j n! a exam * n ® r ® wcrc responsible for developing questions 
ana supplying any related case materials. 

rmrv S £ irit ° f * h ? examinations is characterized by the following 
quotation from the brief set of instructions supplied to all examiners: 

Try to put a nervous examinee at ease by conversation 
o tner than that which pertains to residency and 
practice. Be fair, do not dwell too long on one subiect 
and cover a variety of materials. It should be apparent 

very soon whether or not a candidate can answer a question 
Try to find out hcwjmucjgJ^ 

. Given these instructions, it is reasonable to suppose that these 

”f re des -’- 3ned t0 sample breadth of information and to 
predict that they measured predominantly the recall of information. 

Process Analysis 

f 

■im-Q-n 11 f n ® ffor£ to make a systematic empirical assessment of the 

analoinu« U ti p£00,?sses sampled by the orals an observational study, 
analogous to the process analysis made on the written examination 

evami ° n f 1 * IC * ;e< ^ ™ a . random sample of the over 2,000 individual oral 
“f ™ tl “ S administered as part of the regular January, 1965 
certification procedures. 

A team of eight observers (one orthopaedist, two general surepona 

traine^i^y V n ^ hre !! P rofessional s in educational evaluation) 8 W as 
ln systematic observational analysis. For each observation 

^ rea orded the following information on a form specifically 

developed for this study (See Appendix 4): ay 

A verbatim record of each question, the time it was 
, asked, a list of associated visual stimuli (e.g. X-rays 
slides), the taxonomic level of the question (i.e. 
recall, interpretive skill or problem solving), a tally 
of the number of times the candidate supported his 
answer (e.g. with an appeal to authority, experience 
demonstration, or data), the amount of cueing provided 
by the examiner, the initial score reported by each Q 

examiner and any comments by the examiner or the 
observer that would clarify the nature of the examination. 
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Although it is unlikely that the observers were able to record 
all of the questions asked, 6,868 were recorded in the 158 half-hour 
observations (including ten duplicate observations), and each was 
classified according to the intellectual process it seemed to elicit 
from the candidate. As indicated in Table 8 the degree of inter- 
observer agreement in the classification of questions was sufficiently 
high to assure reliable results. These results summarized in Table 9, 
reveal that : 

1. Overall, nearly 70% of the questions appeared to sample 
only the recall of isolated fragments of information and 
in one discipline, anatomy, over 90% of the questions 
were of this type; 

2 . Fewer than 20% of the questions required the candidate 
to demonstrate skill in interpreting clinical data 
(predominantly X-ray) . 

3. Only 13% of the examiner-candidate exchanges appeared to 
involve any element of problem solving; and 

4 . In fewer than 2% of the responses did candidates cite 
authoritative sources and in only 0.2% of the exchanges 
did they refer to specific data to support an answer. 

It thus appears reasonable to conclude that the traditional oral 
examinations measure about the same type of competence as the traditional 
written examinations, and both assess predominantly the ability to recall 
isolated bits of information. 

Statistical Analyses of the Conventional Oral 

The results of the process and observational analyses of the 
traditional examinations led to a decision to subject the orals 
to extended statistical study during the remaining three years of 
the project. Statistical data derived from these studies of the 
reliability and validity of the oral examination are summarized below. 3 

Reliability of the Oral Examinat ion 

In principle, there are two major sources of unreliability in 
the conventional oral : one is due to errors of rating (i.e. 
different judges will assign different scores to the same performance) 
and the other is attributable to errors of sampling (i.e. different- 
examiners will pose different questions to the various candidates). 

An estimate of the first source of error (i.e. interrater reliability) 
can be obtained by correlating independent scores of two examiners , 
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TABLE 8 

I NT'ER -OBSERVER AGREEMENT , 

IN PROCESS ANALYSIS OE 19G5 ORAL EXAMINATIONS 
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b°th of whom are judging the same porfo3iuar.ee-. of a scried of 

of n orr or U a ? Tr^T ^ U *° •««*• »f the two sources 

bv corro latino lie + s ' 1,n P lint J reliability) can bo obtained 
JL ? fl 11J tic scoros of a series of candidates on two different 
tJons both of which purport to measure the same, thing. 

by havina h a S te™ d of a r ° f *“**«»** reliability was obtained 

y having a team of two examiners administer a single haJf-hour oril 

examination in adult orthopaedics to each of thirty selected residents 

. . th ° timG ° f th ° 1906 In ~ Trainin 9 Examination. The correlation 

th :r n . SCOr °f ° f th ° tW ° GxaminGrs on this series was .72. Under 

“ “r P ?°\ in l° £ thC SO ° reS Cf thG examiners would 
result in a rating reliability of .90 for the half-hour oral under 

study To estimate the combined effects of rating and sampling errors 

wo alf-hour orals in adult orthopaedics wore administered to a 

tio^b , £?nmp]G ° f 25 residents, by two different examiners' The correla- 
tion between scores of the two examiners was .54. Pooling of the scores 
would yield a coefficient of reliability of 67 o?°°f Tw- ° ,. scoles 

obtained from a population that included residents at ail levfliTof WaS 

taineHrom the mo™ S ° meVhat highor than comparable estimates ob- 
tained from the more homogeneous population of Board candidates. 

administration 1 ^ 0 ^ l ' llGSG i flndlngs two modifications were made in the 
aministration of the oral examination: 



1 . 



Since the number of trained examiners was limited and since 
the sampling disagreements seemed a greater source of error, 
than rating disagreements, the decision was made to have 
each examination administered by one examiner (rather than 
a team of two examiners) in order to maintain or increase 
the number of independent examinations. 



( 



2 ‘ ^l P !fr fai * decision .was tpbe based on the pooled scores 
om all oral examinations rather than on scores from each 
examination considered separately. 

Validity o f the Oral Examination s 

studied 6 h ; < ; pnstru ^ t V ? lidity ° f the tradit i°nal oral examination was 

be fssociIted V v-ith g tn 10n h YP° theses = . (D Higher scores will 

that the oral 'If f education and experience; (2) Assuming 
“ cbeoral is designed to measure components of competence other 
than that measured by the multiple choice examination, ^factor 
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analysis will, show the two loading on different factors; and (3) 
Assuming that both types of examinations are representative samples 
of the content specified, correlations between corresponding subject 
matter sub-tests of the oral and written will be higher than corre- 
lations between other sub-tests. Data relevant to the first hypo- 
thesis are summarized in Tables 10 and 11. These data were obtained 
from the 1966 In-Training Examination, in which one conventional 
oral in Adult Orthopaedics was administered to a selected sample of 
233 residents at all levels of training. The results indicate that, 
as a group, fourth year residents perform substantially better than 
residents with less training, a finding similar to that obtained on 
the multiple choice examination (See Section two> Chapter- vm ) . 
However , al thouji mean scores increase with increased amounts of 
training there is substantial overlap in performance from yea" - to 
year. As indicated in Table 11, at least 20% of the second year 
residents scored higher than, the mean score of fourth year residents 



TABLE 10 

MEAN SCORES BY LEVEL OF TRAINING 
1966 IN-TRAINING EXAMINATION 



( i 




Level of Training 


N 


Mean 


Scores 

* SD 


1st year 


29 


65% 


9% 


2nd year 


75 


70% 


13% 


3rd year 


50 


75% 


12% 


4th year 


79 


80% 


10% 


Total 


233 


74% 


12% 


Between group differences 
of ANOVA 


significant at 


.01 level by mean 



Data on the second hypothesis were obtained from factor analytic 
studies of the 1966 and 1968 Orthopaedic Certifying Examinations. 

These studies provided substantial evidence in support of the conclusions 
of the earlier process analysis of the conventional multiple choice and 
(_) oral examinations, in that in the 1966 Certifying Examination the two 
types of examinations showed high loadings on the same factors, whereas, 
after new types of both written and oral examinations had been introduced 
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TABLE 1 1 



DISTRIBUTION OF SCOW 
IN ADULT ORTHOPAEDICS 



5S ON CONVENTIONAL ORAL 
BY LEVEL OF TRAINING* 




examination . The'^charts indicate the distribution of these scores for each 
group of residents. The numbers in the percentile column indicate what per- 
centage of the residents got scores below the score on the left. For example/ 
a resident with 14 mos. of experience who received a score of 86 did better 
than 90% of his fellow residents with equivalent experience. 
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in the 1968 Certifying Examine. Cion the factor structure was consi- 
derably mere complex and the various components of the examination 
loaded on somewhat different factors. Further evidence especially 
relevant to the third hypothesis regarding the inter-relation among 
sub-test scores was obtained from a study of the 1967 Certification 
Examination. Since both the conventional multiple choice and the 
conventional orals were organized on the basis of disciplinary areas 
it would seem reasonable to assume that the multiple choice subtest 
in the written should correlate higher with the oral presumed to assess 
the same content area than with other orals. Table 12 indicates that 
this assurr.pt ion is not valid and strongly suggests that the orals are 
probably measuring a general recall ability independent of any specific 
content area. 



TABLE 12 •*’ 



INTERCORKRLAT IONS BETWEEN. ORAL EXAMINATION SCORES AND 
SCORES ON OTHER EVALUATION TECHNIQUES, 

-- - 1 967 ORTHOPAEDIC CERTIF ICATION EXAMINATIO N 



N = 351 ' Multiple Choice PMP * Oral Examinations 



f • 



Oral 

Examinations 


Total 


Adult 


Pro- 

ficiency 


Adult 


Children ' s 


Trauma 


Basic 

Science 


Adult 


.33 


.27 


.09 


- 


• 

VO 


.34 


.26 


Children's 


.41 


.29 


-.04 


.19 




.37 


.33 


Trauma 


.33 


.27 


.11 


.34 


.37 


- 


.31 


Basic Science 


.41 


.26 


.11 


. 26 


.33 


.31 


- 



* Written simulations of Patient Management Problems (PMP), for a 
description see Section two . Chapter VI . 



Studies of the concurrent validity of various examination tech- 
niques, including the conventional oral, were made in connection 
with the 1966 In-Training Examination which, in addition to conven- 
tional multiple choice questions, included -a set of simulated problems 
in patient management (PMP) in the written examination administered to 
all candidates, and a one hour oral examination administered to 233 
selected candidates at all levels of training. This oral examination 
was divided into three parts: a 30-minute conventional examination 

in Adult Orthopaedics, a 20-ninute simulation of a "diagnostic 
interview" and a 10 minute simulation of a "proposed treatment" 
interview with a programmed patient (See Section Twd> Chapter VII ) , 
Scores on these several components were correlated with training 
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chiefs' ratings of each resident on ten factors representing various 
aspects of competence. Hie results shown in Table 13 are by no means 
easy to irherpret, in part, because of the differing reliabilities of 
the four techniques. 

TABLE 13 

( 

CORRELATIONS BETWEEN SUPERVISORY RATINGS AND 
EVALUATION TECHNIQUES , 1966 IN-TRAINING EXAMINATION 



‘■VN ***«*«- K4-V 



* 


First 


and Second • 


Third anc 


1 Fourth 


Sub -Tests 


Year Res: 


idents (N-107) 


Year Residents (N=ll 


by Technique 


Rating i 


of Rating of 


Rating of 


Rating of 




Problem 


Overall 


Problem 


Overall 




Solving 


Competence 


Solving 


Competence 


Proposed Treatment 
Interview 


.10 


.00 


.15 


.20 


Diagnostic Interview 


.14 


.12 


.23 


• . 16 


Adult Oral 


.15 


.09 


.35 


.23 


Multiple Choice 


.23 


.20 


.26 


.26 



( 



For example, for third and fourth year residents the Diagnostic 
Interview which is characterized by very low reliability is almost 
as good a predictor of ratings of Overall Competence as is the 
multiple choice sub-test which is characterized by relatively high 
reliability. Similarly, for this same group, the conventional oral 
in Adult Orthopaedics is a slightly better predictor of ratings of 
problem solving ability than is the significantly more reliable 
multiple choice sub-test. These latter results may be explained in 
part by the fact that there is heavy emphasis on X-ray interpretation 
in the oral on Adult Orthopaedics, and X-ray interpretation may play 
an important role in the training chiefs 1 definition of problem 
solving ability. 

Results from a subsequent multiple correlational analysis of 
scores on the 1966 In-Training Examination as predictors of resident 
ratings by training chiefs are summarized in Tables 14A and B. These 
data indicate the* scores on the conventional oral in Adult Orthopaedics 
add very little to the prediction of ratings on Factual Information and 
Overall compci.ence obtained from the multiple choice examination^ in 
contrast, the two types of tests contribute about equally to the pre- 
diction of ratings of problem solving. It is also of interest to note 
that scores on the oral in Adult Orthopaedics contribute essentially 
nothing in the prediction of ratings on effectiveness in Patient 
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TABLE 14 



i 



SUMMARY OF RESULTS OF MULTIPLE CORRELATIONAL 
ANALYSIS USING SUB-TEST SCORES AS INDEPENDENT 
VARIABLES AND RATING FACTORS AS DEPENDENT VARIABLES 



1966 IN-TRAINING EXAMINATION 



A. Third and Fourth Year Residents 

N = 119 



Dependent 






, 


Independent 


Partial 


Var 1 nb I e ( Ra t ing 


Factors) R 


F 


Variable (Test Scores) 


r 


F 


Factual 




* .39 


2.80** 


Multiple Choice Total 


.26 


7 . 86** 


Information 








PMP, Treatment Problem 


-.18 


3.50 






* 




Oral, Adult Orthopaedics 


,14 


2 , 14 


Information 




.53 ’ 


6.06** 


Multiple Choice Total 


.37 


17.75** 


Gathering 








Oral* Adult Orthopaedics 


.22 


5.37* 










PMP, Treatment Problem 


-.21 


5.18* 


— 








PMP, Diagnostic Problem 


.19 


4.25* 


Clinical 




.37 


2.47* 


Multiple Choice Total 


.18 


3.72 


Judgment 




* 




Oral, Adult Orthopaedics 


.16 


2.76 










PMP, Diagnostic Problem 


.14 


2.07 










Oral, Proposed Treatment 


.14 


2.05 










Interview 






Surgical 

Skill 


Not 


Significant 










Patient 

Relations 


Not 


Significant 










Colleague 




.35 


2.17* 


Multiple Choice Total 


.23 


6 . 00* 


Relations 








PMP, Treatment Problem 


-.15 


2.40 


Ethics 




.37 


2.51* 


Multiple Choice Total 


.25 


7.31** 










PMP, Treatment Problem 


-.18 


3.63 










Oral, Adult Orthopaedics 


.16 


2. 39 


Overall 




.40 


2 . 94** 


Multiple Clioi.cc Total 


.21 


5.21* 


Competence 








Oral, Adult Orthopaedics 


.17 


3.22* 










PMP, Treatment Problem 


.16 


2.91 
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TAmj: M: (Coni' d) 

B. Total It 'dent Group 

N - 8 



Dependent 




Independent 


Partia 


1 


Variable (Bating Factors) 


R 


F Variables (Test Scores) 


r 


F 


i 

Factual 


.35 


5.19** Multiple Choice Total 


.23 


12 . 80** 


In forma tl on 




Oral , Adult Orthopaedics 


.10 


2.29 


Problem 


.37 


7.47** Oral, Adult Orthopaedics 


.18 


7.47* 


Solving 




Multiple Choice Total 


.15 


5 . 01* 






Oral, Proposed Treatment 


.13 


3.52 






. • Interview 






, .. _ T ._ r . f f |r i| . 




PMP Treatment Problem 


-.11 


2.82 


Information 


*.39 


6.78** Multiple Choice Total 


.25 


14.50** 


Gathering 




PMP, Treatment Problem 


.15 


5.38* 






Oral, Adult Orthopaedics 


.14 


4.23* 



Clinical 

Judgment 


.34 

t 


4.67** 


Multiple Choice Total 
Oral, Adult Orthopaedics 
Oral, Proposed Treatment 
Interview 


.17 

.13 

.10 


6 . 64* 

3.87 

2.17 

( 


Surgical 

Skill 


.25 


2.36* 


Multiple Choice Total 
Oral, Adult Orthopaedics 


.12 

.12 


— - , * , . 

3 .44 
. 3.32 


Patient 

Relations 


Not Significant 




• 






Colleague 

Relations 


.24 

0 


2.23* 


PMP, Treatment Problem 
Oral, Proposed Treatment 
Interview 

Multiple Choice Total 


- .13 
.12 

.10 


3.59 

2.95 

2.38 


Ethics 


.25 


2.45* 


Multiple Choice Total 
PMP, Treatment Problem 


.16 
- .15 


6 . 05* 
5 .18* 


Overall 

Competence 


.32 


4 . 16* 


Multiple Choice Total 
Oral, Proposed Treatment 
Interview 


.18 

.10 


7.26* 

2.41 



* significant at .05 level 
** .significant at .01 level 
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Hoi ationsbips and in Colleague Re Inti orisliips despite the fact that 
the. oral confrontation could bo expected to be indicative of abilities 
in these areas. In contrast, the new types of orals, especially those 
that involve simulated interviews with programmed patients, appear to 
make major contributions in the prediction of these behaviors. 

In' summary studies of the traditional orals strongly suggested 
first, that they are slightly, though not significantly more 'indi- 
cative of problem solving and interpretive skills than is the con- 
ventional multiple choice examination, and secondly, that the manner 
in which they are ordinarily used t:o determine competence entails 
certain serious, inherent flaws. Revisions of the technique to 
preserve and enhance its values while minimizing its deficiencies 
was deemed to be clearly indicated. ‘ The new techniques of oral 
examining developed to meet these needs are described in Section Two 
Chapter VII below. 



1 Benjamin S. Bloom, ed . The Taxonomy of Educational Objectiv e. 
Handbook I: Cognitive Domain New~York David McKay Co. 1956 

2 Christine McGuire "The Oral Examination as A Measure of 

Professional Competence." Journal of Medical Education 
41:267-274 March, 1966 ' 
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H. Levine and J. Noak, "The Evaluation of Complex 
Outcomes" Offi ce of the Superinte ndent of Public 
State of Illinois. 1968 
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SECTION. TWO . 

DEVELOPMENT AND ANALYSIS OF 
NEW EVALUATION TECHNIQUES 



CHAPTER XV 



STATEMENT OF THE PROBLEM 



The critical incident study had indicated that effective behavior 
as an orthopaedist required the achievement of minimum competence in 94 
specific categories of behavior. The analysis of the evaluation tech- 
niques used by the Board revealed serious gaps in the procedures used 
to assess those comx^etoncies . The study team was faced with a challenge 
to devise new techniques or revise old ones in order to assess these 
competencies , to conduct studies on the validity and reliability of the 
new techniques , and to assist the Board in incorporating the new teen— 
niques in its certifi. cation procedures. Furthermore, it was necessary 
to carry out this research within the framework of the regular exam- 
ination program conducted by the Board for purposes of certification 
and by the Academy. for purposes of training. 



Alternative Appr o a ches to The Development of Assessment Techniq ues 



In initiating the design of such new instruments three possible ap- 
proaches were considered: 

The analytical approach which requires that the elements of the 
behavior to be measured be carefully defined and means be developed 
to sample as many of these elements as possible. For example, the 
ability to recall information about various diseases is one element 
in the ability to diagnose the causes of a particular constellation 
of findings; one test of recall will therefore provide information 
about a necessary but insufficient condition for accomplishing the 
main objective. Utilizing this approach it is relatively easy to 
develop exercises in sufficient number to provide a reliable sample 
of at least one of the specific behavioral elements to be assessed. 
However, this approach is limited to two respects: First, mastery 

of the elements of a behavior does not necessarily assure mastery 
of the total behavior; on the other hand, the elements of a complex 
behavioral pattern as derived by logical analysis may not be empiri- 
cally verifiable. For example, diagnosis of a medical problem may 
actually require much less immediately retrievable information than 
"logical" analysis v.ould lead one to believe. 



The simulation approach to the assessment of complex behaviors 
involves designing standardized situations that imitiate reality in 
requiring the examinee to demonstrate the type of behaviors one desires 
to assess. This approach has the advantage of yielding a direct measure 
of the complex terminal behavior rather than an assay of its prerequi- 
sites or elements and, thus, requires fewer assumptions about the nature 
of that behavior than does the analytic approach. Further, it is sub- 
ject to lesser errors of rating and sampling than direct observcition of* 
reality. On the other hand, the simulation approach suffers from certain 
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disadvantages: First, to the extent that a "real life" setting differs 

from a simulation, the simulation may lack validity in predicting 
behavior in the former situation. Second, because of their complexity, 
simulation exercises are more difficult to construct and score than 
simple analytical ones. Third, a simulation exercise requires more 
testing time per unit than do most conventional analytical exercises and 
tills requires more total testing time to reach a given level of reliability. 



The ob servational approach entails the design of instruments for 
systematic observation of actual behavior in real situations. This 
method is especially useful in assessing behaviors which are difficult 
to simulate or which are not readily amenable to logical analysis into 
elements. Thus observational techniques are likely to be more valid 
than other methods of measurement since they obviate the necessity of 
hypothesizing some relation between the observed behavior and "reality." 
But these techniques also suffer from certain potential defects: First, 

the presence of an observer may significantly alter the nature of the 
behavior being observed; Second, there are inherent problems of samp- 
ling and rating observations that make it difficult to achieve reasonable 
levels of reliability with such methods; and, finally, certain important 
aspects of behavior are quite difficult to observe, even in "reality," 

Further, it should be noted that the observation of behavior is of 
two types: One, observation and recording of a specific sample of ( 
behavior (e.g. a patient interview) and, two, the observation and 
ratiiig of the examinee’s habitual behavior. These two types have 
different advantages and disadvantages: In the first type it is often 

possible to minimize examiner bias by using standardized observational 
techniques and by training observers; however, since situational variables 
are so important in some behaviors, it may be quite impossible to gen- 
eralize about an individual's competence from only one or two specific 
observations, furthermore, the potential distortions attributable to 
the presence of an observer are maximized in this type of test situation, 
Xn contrast, observation and rating of habitual performance suffers 
greatly from observer bias, and some persons who are in an ideal position 
to know about habitual performance are not capable of rating it objec- 
tively; however, when observations are made by skilled raters in a 
position to observe an examinee's behavior over long periods of time, 
they have maximum generalizability and validity. Finally, it is obvious 
that certain aspects of performance (e.g. ethical attitudes) are not 
readily susceptible to other modes of evaluation. 



S pecific Instruments Developed 

In the Orthopaedic Training Study, all of th^ approaches described 
above have been employed in designing the new types of evaluation inst.. 
ments listed in Table 15. Once a new method was devised, experimental . 
instruments were developed and administered to various populations in 
order to obtain evidence regarding the validity and reliability of the 






# 






A 



3ij 






TABLE 15 



LIST 

TI1E 



OF TECHNIQUES 
ORTHO PAEDX.C 



UT1IJ./JJD 

TRAINING 



DURING 

STUDY 



TECHNIQUE 


COMMENTS | 


I. Analytical 

A. Multiple 
Choice 
Tents 


1 

i 

Those exorcises require the examinee to demonstrate 
problem solving and i nterpretivc abilities as well as 
recall of information. 


XX. Simulation 




A. Written 


These exercises require the examinee to demonstrate 
skill’ in solving diagnostic and/or treatment problems 


Simulations 




involving sequential analysis and decision; they employ 
a sx^ecial answer sheet with an erasable overlay design- 




ed to provide feedback to the examinee about the re- 
sults of his inquiries and therapeutic interventions . 


* 




B. Oral Tests 


The 5 types of exercises, described below, though not 


of Complex 


strictly simulations , resemble them in many ways and 


Cognitive 

Behavior 


share the same characteristics. 


1. Diagnostic 


This type of exercise requires the examinee to arrive j 


Problem 


at and defend a diagnosis on the basis of information 
.he elicits by inquiries made to an oral examiner. 






2. Defense of 


This type of exercise requires the examinee to present 


Therapy 


his rationale for the therapeutic decisions he makes on 


Problem 


a standardized case presented to him. 


3. Emergency 


This type of exercise require the examinee to describe 


Treatment 


the procedures he would follow in treating a specific 


Problem 


emergency case in the emergency room, and the actions 


* • 


he would take in responding to the consequences of his 




therapeutic interventions and diagnostic inquiries as 


t. 


reported by the examiner. 

»" ■ 11 ™ -- • L 1 - L J1 " J 1,1 
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4. Coiup.TxcaLj.ui) 
Problem 


Thin type of exerci se :i s simi lar to the emergency treat- 
ment problem, except that it deals with problems of 
long term care. 


5. Observation 
and Inter- 
pretation 
Exercise 


This type of exercise requires the examiner to describe 
what he sees in slides and x-rays and to relate those 
to other data about a specific case. 


C. Oral Simula- 
tions of 
Into rp e r r, a n a 1 
Conferences 


This type of exercise requires the examinee to play the 
role of a physician while the examiner assumes the role 
of a patient, colleague, or paramedical person in typi- 
cal interpersonal confrontations in the practice of 
medicine; the exercises are designed to evaluate the 
candidates' ability to communicate with and relate 
effectively to patients, colleagues and paramedical 
personnel . 


D. Oral Simula- 
tions of 
Group Confer- 
ences 

% 


This exercise requires 5 examinees to simulate a staff 
Conference on the management of 2 specific cases. 


' OBSERVATIONS 


f 


A. Habitual 
Performance 
1. Candidate 
Evaluation 
Form 


This form is designed to obtain supervisor's rating of 
various aspects of the habitual performance of candi- 
dates for Board certification. 


2. Resident 
Evaluation 
Form 


This form is similar to the Candidate Evaluation Form, 
but is designed to obtain supervisor's ratings of re- 
sident performance. 


B. Samples of 
Performance 
1. Rating Form 
for Assess- 
ing Surgical 
. Skills 


This form is a detailed checklist to be used in the ob- 
servation of any surgical procedure. 


2. Rating Forms 
for Evalua- 
ting Behav- 
ior in Oral 
Examinations 


These forms specify and define the criteria to be emp- 
loyed in rating various aspects of performance on each 
type of oral examination. 

* 
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technique*. As noted earlier most of these reliability and validity 
studies were carried out in connection either with the final Certification 
1‘xauii Tint ion administered by the American hoard of Orthopaedic Surgery for 
purposes of specialty board certification to all candidates who have 
completed an orthopaedic residency and one year of practice, or in 
connection with the In-Training Examination administered by the American 
Academy of Orthopaedic Sturgeons for diagnostic and feedback purposes to 
virtually all residents in the United States training programs. The 
specific examinations on which studies were conducted are listed in 
Table 1G . 



Da ta A nalys is 



Studies of the reliability of the newer objective techniques (i ,e . , 
the multiple choice and written simulation exercises) were conducted by 
employing adaptations of conventional methods for estimating internal 
consistency . Studies of the iuterratcr reliability of techniques 
involving subjective judgments , (i . e ., orals and rating forms), were con- 
ducted by obtaining and correlating independent ratings of two observers 
judging the same behavioral sample; estimates of the combined effects of 
rating and sampling errors were obtained by correlating the independent 
ratings of two observers judging different behavioral samples. 

Data on the content validity of the several techniques were collected fry 
process analysis. Data on construct validity were obtained by studying 
the relationship between test scores and such examinee characteristics 
as age, practice setting, and amount of training utilizing analysis of 
variance methods. Additional data on construct validity were obtained 
from correlational and factor analytic studies designed to test hypo- 
theses regarding interrelationships among scores on the various types 
of exercises. Data on concurrent validity were obtained by multiple corre- 
lational analysis of the relationships between ratings of habitual 
performance and scores on the several types of exercises . 

Though many techniques were, of necessity, analyzed simultaneously, 
the data on each will be presented separately and, in the interests of 
clarity, will be reported throughout this section in the following order: 

1. A description of the technique, a brief history of its 
development and some basic considerations in construction 
of exercises of the type described; 

2. Data and discussion of the -reliability of the technique; 

Data and discussion of the content, construct and con- 
current validity of the technique. 



3 . 
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TABLE 10 



EXAMINATIONS STUDIED DURING 
ORTHOPAEDIC TRAINING STUDY 



** **•***■«*»»«* r <**■* t>wtm l WMWtnMmini < w » - ■ utamuwi . www anr -.atAni** ****** unwi * *** raj**** mwmam/ Atm'ri ++AMt** *c . -**n» wwiM <wi « u<w> ,pww jjw>wwwwi^)»m»w*m < w — i jww 

EXAMINATION POPULATION COMMENTS 


(1) May 1905 1 400 

Certification j Candidates 


First extensive study of written simulation 
exorei sen. 


(2) Nov. 1965 j 1,398 

In-Training | Residents 


First study of In-Training Examination 


(3) Jan. 1900 
Certification 
Final 


40 1 

Candidates 


First study of new oral examinations and 
continuation of study of written simulation 
exercise s . 


184 

Examiners 


Study of construct validity of the written 
simulation: administered to candidates 


(4) May 1960 
Certification I 

i 


459 

Candidates 


First attempt to obtain ratings of habitual 
performance from training chiefs in a study 
of written simulation exercises and multiple 
choice questions. „ 


(5) Nov. 1966 
In-Training 


1,539 

Residents 


Validation of simulation exercises and mul- 
tiple choice questions by use of ratings of 
habitual performance and cross-sectional 
analysis of performance . of sub-groups. 


233 

Candidates 


Orals administered to sample of residents 
to obtain reliability and validity data on 
all techniques. 


(6) Jan. 1967 
Certification 
Final 


449 

Residents 


Replication of Jan. 1966 study of the 
Final Certification examination. 


(7) Nov. 1967 
In-Training 


1,682 

Residents 


Replication of Nov, 1966 study of the In- 
Training Examinations. 


(8) Jan. ] 968 
Final Certifi- 
cation 


838 

Candidates 


First complete implementation of revised 
certification procedures including the 
adoption of a profile system for reporting 
scores ; reliability and validity studies 
of all techniques. 


TOTAL 


7,480 


Examinees who participated in more than one 
study are tallied separately in each. 

P— ’■'—‘■7 ■ 



O 

ERIC 









mumm 






42 



The two fin'll e ik ip for .‘5 of this Sect Ion contain, first, a report 
of flic* data and d j seuss i on of the inlerrel alii onshipa among al l. 
t eclmj.ques , and second, a description of the system developed by 
the American Board of Orthopaedic Surgery f n incorporating al l. 
techniques in its cumn.L certification proccduj. es. 



CHARTER V 



OIj 8 KKVAT j C'R a L forms 



As noted enrlie:: (See Table 15), most of the new types of exami- 
nation exercises developed in l:bc Ortbopaedic Training Study represent 
efforts to simulate the important behaviors described in the critical 
incident study. Since simulations cannot be perfect copies of reality, 
it is necessary to demonstrate that the correspondence between behavior 
in actual situations and in the simulated situations is such as to 
justify using the latter--for purposes of estimating competence in 
the former. For this reason, observational techniques have been exten- 
sively employed as criterion measures in this study, despite the defici- 
encies in reliability and validity to which they are often subject. 
However, to the extent that data from observations agree with data from 
simulations, increased confidence can be placed in both. 



Fo rms f or Ra finer Habitual Performance 



Ratings of habitual performance have been utilized in this study 
for three -major purposes: (1) to provide data on the concurrent vali- 

dity of other types of evaluation techniques; (2) to provide data on , 
aspects of competence not assessed by other techniques; and (3) to 
educate training chiefs regarding the dimensions and complexity of 
competence in orthopaedic surgery and to assist them in monitoring the 
progress of residents in achieving these goals. 

For these purposes two different observational forms were developed , 
The Resident Evaluation Form (Appendix 5 ) was developed primarily as part 
of the In-Training Evaluation of residents. It was designed to obtain 
evidence on the following factors: ability to recall factual information 

concerning general 'medicine and orthopaedic surgery, ability to use 
information to solve problems, ability to gather clinical information, 
judgment in deciding on appropriate treatment and care, skill in surgical 
procedures, relating effectively to patients, relating effectively to 
colleagues and other r idical personnel, demonstrating the moral and 
ethical standards required of a physician, and overall competence as an 
orthopaedic surgeon, A revised version of this form has been incorporated 
a3 part of the In-Training Examination and is completed each year by the 
resident's chief of training (Appendix 6 ) , The Candidate Evaluation Form 
(Appendix 7 ) was developed for rating of candidates for certification; it 
is an adaptation of the earlier Resident Rating Form designed to yield 
•evidence on the following factors: information gathering, problem-solving, 

clinical judgment, surgical technique, relating to patients, continuing 
responsibility, emergency care, relating to colleagues, moral and ethical 
values, and overall competence. 
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I'iv'iio two forms rep.' eseut so; v.'hal di3 feienl. ways of resolving 
fhe prol.0 U:i“. of obtaining valid and reliable ratings. 

The first of those problems if; that of spec j fying and doner j bir vj 
th.e factor;; on which x atl ngs arc; to bo made so as to avoid having 
behavior of the same type cl assif i c-d differently by two raters . The 
two sample descriptions of factors reported in Table 17 illustrate 
the alternative ways in which this problem was resolved in the two * 
forms under discussion. 

The second problem concerns the nature of the scale to be used 
in recording the ratings. You will note that a 12-point scale was 
utilised in both forms in order to facilitate pooling of data from a 
variety oi techniques for purposes of inabiny an overall judgment of 
a candidate's competence. * however, once that scale had been decided 
on different methods wore employed in the two forms to try to maximize 
agreements between raters regarding the meaning of each point on the. 
scale. The reader will note (Tabic 17) that in the candidate Evaluation 
Form this problem was dealt with by describing the extreme points of the 
scale in behavioral terms and by placing adjectival meanings at inter- 
mediate points. While this is a defensible procedure for ratings in 
a certification examination, it creates certain difficulties because 
raters are often reluctant to use the negative half of the scale. The 
consequent restriction of range lowers the reliability of the ratings. 

B’or example, among* 1574 ratings on the "Overall Competence" of can- 
didates applying for the 1963 Certifying Examination only 5 were in 
the "poor" range of the scale, only 78 in the "marginal" range, but 
there were 174 additional ratings at "7" which is the point on the 
scale just above "marginal".** 

Because of this "error of leniency" the adjectives were eliminated 
from the Resident Evaluation Form and the raters were instead instructed 
to rank residents as follows: 



"In filling out this 'form you are to rank the resident on each 
factor in terms of all the residents in orthopaedic surgery 
you have known during your career. You are to indicate your 
rankings by checking the appropriate box under each factor. 

In making these evaluations DC NOT take into account the 
resident's level of training. For example, a second year 
resident may have the potentiality to display outstanding 
surgical ski3_ls, but many fourth year residents might function 
AT THE PRESENT time on a higher level. He should be ranked 
lower than they are ranked on surgical skill." 

* Data from oral examinations and from observations are collected di- 
rectly on the 12-point, scale. Data from written examinations (i.e., 
multiple choice and simulation exercises) are converted to the 12- 
point scale; for a description of the conversion technique sec Section 
Two C hapter y . 

** See Appendix 8 for complete set of rating data on the 1968 
Certifying Examination. 
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While this technique Lends Co reduce Che "error of leniency" 
it suffers from Che defect Chet: each Tatar must employ standards 
based upon Che sample of residents he has met:, and Lhcse samples 
will di.ifei from program Co program. Where each judge rates a rela- 
tive y Icirgo roup > errors nr 5 sing from thin source can sometimes be 
nunuu'j zed by normalizing the distribution of ratings from each observer 
and converting all to standard scores; this approach was not possible 
in this study due to the fact that each supervisor rated very few 
cases For this reason every effort was made to obtain multiple ratings 
on each individual; this too proved to be unfeasible in this study. The 
observational data presented below art* therefore based on the pooled 
ratings of only two supervisors. 

of Observatio nal Forms 

In studying the reliability of the Resident Evaluation Form ratings 
of each resident were obtained from two supervisors in the same program- 
whereas m studying the reliability of the Candidate Evaluation Form,' in 
genera^, the ratings of each candidate were obtained from training chiefs 

, g . lif G f cut programs; these data could therefore be expected to have 
maximum generalizability . 

i 

The results of these studies, as summarized in Table 18, indicate 
that the Candidate Evaluation Form is much less reliable than the ( 
Resident Evaluation Form. This finding can be attributed to several 
factors. First, a candidate probably does behave somexfhat differently 
in different programs; it is therefore reasonable to expect a lower 
correlation between ratings in different programs than between those 
obtained in the sane program. Second, raters are more likely to agree 
on standards within programs than across programs. Third, raters within 
a program often discuss resident performance and thus influence each 
others ratings; this is far less likely to happen across programs. 

Fourth, residents ^o, in fact, differ more than candidates simply because 
of the greater variation among them with respect to education and 
experience; this increased range of competence will in itself tend to 
increase the reliability of the ratings. Finally, supervisors are less 
hesitant about giving low ratings to residents than to candidates and 
this, too, will increase the range, and thus increase the reliability 
of resident ratings as compared with candidate ratings. 

Given these considerations it seems reasonable to conclude that 
the reliability of the ratings can be significantly raised only if 
there is opportunity to increase the number of ratings per candidate 
and to institute a training program for raters. It is therefore of 
interest to note that in incorporating the rating forms in the regular 
certification procedures, the Board is doing so in a manner designed to 
produce these improvements in the collection of observational data. 






TABLE IB 



RELIABILITY 



Vh'i'h 



ON RAT TNG FORMS 



A. Resident Evaluation Form (N : ~190) 




B - Candidate Evaluation Form (N-391) 



FACTOR 


MEAN OF 
BOTH 
RATINGS 


SD OF BOTH 
RATINGS 
COMBINED 


CORRELATION 

BETWEEN 

RATERS 


RELIA- 

BILITY 


1. Information Gathering 


9.1 


1.4 


.17 


.29 


2 . Problem Solving 


9.0 


1.5 


.17 


.29 


3. Clinical Judgement 


9.1 


1.4 


.19 


.22 


4, Surgical Technique 


9.3 


1.3 


.17 


.29 


5. Patient Relationships 


9.3 


1.5 


.14 


.25 


6. Continuing Responsibility 


9.5 


1.4 


.13 


,23 


7. Emergency Care 


9.6 


1.3 


.16 


.28. 


8. Colleague Relationships 


9.4 


1.5 


.17 


.29 


9. Moral and Ethical values 


10.2 


1.4 


.15 


,26 


10. Overall Competence 


9.3 


1.3 


.18 


.31 



'' Computed by the Spearman-Brown Formula. 
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Val :i _dJ t-y of ^Ob; J, ' ; ' 



‘Content' Va 1 < (lit y_. 

Since U'io factory to be rated in the Candidate i.p.l 
Resident Mvnl uation l’ornin were derived dj really from the speed f i e ex- 
ponents of competence identified in Uio Critical Incident Study and 
refer to habitual , observable performance, those forms, are, hy def i ni t i on, 
characterized by high content validity, in short, in using rating fvuui. 
of the type described above, it is unnecessary to make any assumption 
about the relation between the behaviors sampled by the instrument and 
those demonstrated in "real life" situations. 



Despite this obvious advantage, they, li&o. most other rating forms, 
are subject to certain deficiencies which can reduce their validity 
significantly. One such defect derives from the tendency of some raters 
to rate individual s who- arc high or lev/ in one trait as high or low in 
all traits (the halo effect). Of special importance in this regard is 
the tendency to credit* the person with a pleasing personality with 
greater cognitive shills ( than he has actually achieved# Since an 
important aspect of competence for physicians and other professionals 
who must constantly deal, with people is the ability to impress others 
with their competence and dedication, this specific halo effect may 
not be as serious as it seems; it nevertheless reduces the validity of 
the rating to the extent that it results in inflated evaluations of 
purely cognitive or psychomotor shills. Second, in using the forms 
under discussion, raters are sometimes guilty of logical errors in 
assuming a closer relation between certain attributes, for example, 
problem-solving shills and clinical judgment, than is warranted. Hiyh 
ability in one such factor may lead to undeservedly high ratings on the 
other. Third, raters sometimes rate examinees on attrnbutes of behavior 
which they have not directly observed, in such cases they tend to ai rive 
at a judgment on the basis either of a general "halo" or the type of 
logical error discussed above. For example, in this study, statistical 
analysis of the ratings leads to the suspicion that in rating "information 
gathering ability" supervisors were actually rating "diagnostic ability," 
jChis ^suspicion is supported by the fact that .‘residents are rarely observed 
gathering information, i.e. interviewing a patient, for example. Finally, 
ratings are sometimes affected as- much by the inadequacy of raters as by 
the ability of the examinees; some raters are systematically too lenient, 
others too harsh, and others indiscriminating. 

For the reasons outlined above ratings must be considered as simply 
mother evaluative technique and not as the ultimate criteria against 
which all other evaluation techniques must be judged. However, since 
ratings are obtained in a fashion so different from other evaluative 
techniques, agreement in the results from two such sources supports the 
view that there must be some underlying behavioral manifestations which 
account for the congruence and which thus help to confirm the validity 
of both the ratings and. the other evaluation techniques. it is from this 
point* 1 of view that studies of concurrent validity have been conducted 
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using ratings as criteria . 



C onstruct: Val id ity 



Two types of studies were conducted to investigate the construct 
validity of ratings of habitual performance. The first entailed 
analysis of the relation between level of training and level of ratings. 
This study revealed slight differences In the expected direction. For 
example, utilizing a 12“ point scale the mean rating on ’’Overall Compe- 
tence" was 7.7 for residents with 1-2 years' training and 8.1 for 
residents with 3-4 years’ training. While this difference is statis- 
tically significant for the number of cases included in the study, one 
must question its practical significance. 



The second study of construct validity of the ratings entailed 
analysis of the interrelationship among scores on the several factors, 
to determine the amount of halo in each. Table 19 summarizes the 

i 

importnt correlational data for both the Resident and Candidate Evaluation 
Forms. It indicates some independence of factor scores but also reveals 
a strong halo effect. This independence is more marked in the Resident 
Evaluation Form than in the Candidate Evaluation Form, probably due to 
higher reliability of the former. Note, for example, that the corro>la~ 
tion between ratings of Surgical Technique and Patient Relationships on 
the Resident Form ranges between .35-. 43 when the same rater judges both, 
but only .17 across raters; the correlation between ratings on these two 
factors on the Candidate Evaluation Form is .58-. 62 for the same rater 
and .10-. 12 across raters. Further, it is important: to note that as a 
consequence of both the halo effects and the differential reliabilities 
of different factor ratings, the correlation between two ratings of the 
same factors is in some instances lower than the correlation between 
two judges' rat ings of different factors. For example, on the Resident 
Evaluation Form the correlation between two ratings of Surgical Tech- 
nique is .35 while the correlation between one rater's rating of Surgical 
Technique and a second rater's rating of Problem Solving is .33- .34. This 
type of correlational pattern is even more pronounced in the Candidate 
Evaluation Form. 



In view of these findings it is of particular interest to examine 
the relationship between ratings on various factors of habitual perfor- 
mance and scores on tests designed to measure these same behavioral, 
factors. Table 20 presents illustrative data of this type It indicates 
as would be expected, that scores on the recall component of the multiple 
choice examination arc less closely associated with ratings of "Patient 
Relationships" than with ratings of "Problem-Solving" skill. Much of 
the data of this type on the concurrent: validity of the rating forms is 
based on the assumption that the forms are valid, and that the validity 



* For a full discussion of this aspect: of 
techniques, see Section JLwq Chapters 



concurrent validity of 
, VI through VIII . 
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of tho testing techniques can lie assessed by analyzing their value in 
predicting ratings , treating the latter as dependent variables. _ Such 
analyses therefore properly belong with the discussions of the indi- 
vidual techniques that are treated as independent variables. However, 
given the factorial structure of certain evaluation techniques it: is 

also appropriate to analyze the pattern of interrelationships 

ratings and test scores where the various rating factors are treated 
as the independent variables. 

TABLE 20 



ILLUSTRATIVE CORRELATIONS BETWEEN 
SELECTED RATING FACTORS and TEST SCORES 

I 

"1968 ORTHOPAEDIC CERTIFICATION EXAMINATION 

•' N=391 



Rating Factors 


Test Scores 


Multiple 

Choice Oral Mean Simulation 

Recall Problem Solving Attitude s__ 


t 

Problem Solving 
, Surgical Technique 
1 Patient Relationships 


.33 .31 .19 
.16 .21 .10 
.10 .21 .15 



Such data are summarized in the multiple correlational analysis 
reported in Table 21, These data reveal that the rating form factors 
have different weights when used to predict different test scores. It 
is especially interesting to note the effects of variables that have 
negative partial r’s. For example, note that Overall Competence and 
Problem-Solving account for most of the correlation between the 
Multiple Choice Recall score and the rating factors, and that this 
correlation is improved when scores on Patient Relationships and ^ 
Surgical Technique are given negative weights. This phenomenon is prob- 
ably explained in part by the fact that in rating Overall Competence the 
supervisor takes into account the cognitive, psychomotor and affective 
skills of the subject; if he perceives two men as equall y, competent, 
the man with higher affective and psychomotor skills will be likely to 
have lower cognitive skills since the chief’s perception is the result 
of an amalgam of all three components of competence. 

The multiple R's reported in Table 21 and in the subsequent tables 
in this section may seem dismayingly small, and such a conclusion would 
be justified if the tests had been developed for purposes of predicting 
supervisors’ ratings. Ra+her they are int^ndM as measures of compe- 
tence for purposes of certification. The studies of concurrent validity 
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ha-^e Laer afore been designed to test the hypothesis that the tests,, insure 
important areas of competence. Second in interpreting the correlations 

f nd teSl: Scores Jt is necessary to recognize 'that 
_ " ‘-labilities oj. sub-scores o£ each arc? in some cases quite low. 

For example, the reliability of the multiple choice recall score' is ’.71 
and in no case does the reliability of a rating factor exceed .31. (See 
table lb; Given this much error in the two sets of scores a multiple 

correlation of .34 is of considerable practical, as well as theoretical , 
significance. 

xn summary , the rating form data in the present study have been 
useful primarily as. a means of gathering evidence on the validity and 
the factor composition of the various test techniques. The results to 
date suggest that by increasing the ‘number, of observations, and by 
specifying the factors. to be rated in more operational detail, the 
reliability of. the' ratings can be raised sufficiently to justify their 
use in evaluating the relative effectiveness of varied curricular 
settings. Ultimately, however, the main impact of the observational 
techniques developed for this study will depend on the degree to which 
they are of assistance to those responsible for training in defining 
program objectives in more specific terms, in monitoring residents' 

progress and in diagnosing the strengths and weaknesses of both the 
resident and the program. 
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Ratings of Specific 



Incidents of Performance 



O bservational R atin g of On- the -Job Performance 



In addition to ratings of relevant aspects of habitual performance 
it is often useful to record an examiner’s evaluation of a specific 
incident of on-the-job performance. In the current study only one such 
rating form, The Form for ^Evaluation of Surgical Skill, has been developed 
An excerpt from that form* is shown below, illustrating the specification, 
the description, and the method of rating one factor. 






Factor I , Initial pre p arat io n for surger y 

( 

Did surgeon reaffirm procedure with patient before surgery? 
Was surgeon properly gowned? (Cap over hair, mask tight, 
careful timed scrub of hands, gets into gown and gloves 
properly.) Did he review procedure and position with 
anesthesia staff? Was the position of patient appropriate 
for procedure. (Surgery, tourniquet, X-rays, bone graft, 
etc.) Did he know the names of nursing and medical staff? 
Was preparation of patient's skin adequate? (Check area 
scrubbed, technique of applications, method of discarding 
sponges, etc.) Was draping satisfactory and appropriate to 
procedures? Did he prevent contamination by others? 

4 □ Excellent 

3 E3 Good 

2 CZ3 Adequate 

1 □ Poor 



Comments 



mi 



Note, that there are alternative possible ways of scoring performance: 

For example, it would be possible to require the observer simply to 
aiiswer "yes" or "no" to each of the questions listed in the excerpt. 

Such a method could be expected to achieve high reliability in initial 
ratings; however, it would still be necessary to determine (presumably, 
on the basis of the "yes" and "no" answers) whether the surgeon had 
mastered the procedure at a satisfactory level. In the present study 
the decision was therefore made to utilize the questions merely to define 
the factor to be evaluated and to require the examiner to make a direct 
value judgment about the adequacy of the behavior observed. 



To date, this form has been employed only experimentally in small 
pilot studies; no statistical data are as yet available on the reliability 
and validity of ratings derived from it . 



* See Appendix £ ^ or ihe complete text of The 
of Surgical Skill. 



Form for Evaluation • ■ 



o 
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Observations of O ral S imulations and 
Oral Te sts of Compl ex C o gnft iv c Behavior 

Since oral simulations are complex performances, the problems of 
developing and applying rating forms for scoring oral examinations are 
similar to those involved in evaluating on-the-job performance. 



i 



The effectiveness of such examinations depends upon two conditions: 
First, the ability of the examiners to develop techniques which elicit 
behaviors that do, in fact, sample important areas of competence; and 
second, the development of rating procedures that will yield reliable 
assessments of the behaviors elicited by the techniques. The first 
condition is discussed in subsequent chapters in connection with 
summaries of the studies of oral examinations; the second condition is 
the main issue of concern nerc . 



During the course pf the Orthopaedic Training ? Study , four types of 
forms for rating oral examinations were developed/' Two approaches are 
represented in these forms. The first approach is illustrated by the 
following excerpt taken from the ’'Rating Form for Use with Patient 
Interviews." The reader will note that this approach is similar to that 
discussed above in the rating of habitual performance, and like its counter- 
part, can suffer from various types of errors of classification, e.g,, 
one examiner's "04" is equivalent to another's "07" even though both 
agree on what the examinee did. However, it is important to observe that 
so long as this represents systematic differences in standards, errors 
of this type do not influence the correlation between the scores of two 
raters. 



Factor I: Ability to elicit an adequate amount of pertinent information 

(The candidate should ask most of the indicated questions; 
other questions should be appropriate to the diagnosis.) 



01 02 03 

OO D 



04 05 06 07 08 09 10 11 12 

□ metmm » tvT’rVwr. »* % 

lw ■ m i >.7 V*»***vr il 



Poor 



Adequate 



Good 



Excellent 



In the present study it was found that in utilizing the form illustrated 
above, the correlation between two examiners ' ratings of the same set 
of half-hour oral examinations was generally about .70** It would 
therefore appear that much of the difficulty in generalizing from oral 
examination data is due not so much to errors of perception in human 
judges, but to errors of classification and sampling. This is an 

* See Appendices yp through], 3 for copies of these forms. 

** For a detailed summary of the data see Sect ion two . Chapter vu. 



57 



important f .inching since classification errors can be reduced by 
statistical correction and by .pooling data from many examiners, while 
sampling errors can be reduced by enlarging the sample of examinee 
behavior observed. Alternatively, if the judges disagree seriously in 
the ranting of examinees, u se the rati ncfs as mea sure 

9A. <o.tcncc, since they will reflect rater "bias "rather than " 

the "real" behavior of the examinee . •' 



RATING SCALE 



INSTRUCTIONS : 



Place a tally mark in the appropriate box for EACH 
statement EACH candidate makes. 
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The second approach to the development of rating forms .is illustrated 
in the excerpt from the "Rating Form for Simulated Patient Management 
Conferences" quoted above* In using this form the examiner is directed 
to record the number of times that a specified behavior occurs. On 
theoretical grounds it would appear that this approach would tend to 
maximize inter-observer reliability despite the fact that observers will 
disagree on the classification of some specific behavioral incidents. 
Unfortunately, it was not possible in the present study to obtain data 
on the use of this method with a sufficient number of cases to test this 
hypothesis . 



Summary Comm ent 



The objectified forms developed in this study for rating both 
habitual performance and oral examinations appear to provide valuable 
information on aspects of examinee performance not readily assayed by 
written examination. However , both types of ratings are subject, to 
errors of classification and sampling which must be taken into account 
in generalizing the data derived by such methods. With due allowance 
for such errors, the forms developed for use in this study yield ratings 
of resident performance at different levels of training that differ in 
the expected direction, and ratings of candidate performance that are 
associated in the hypothesized ways with estimates derived from more 
objective written exercises. 



* For a description of this examination and the complete text of the 
form see Section Two , Chapter V II , and Appendix 12. 
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CHAPTER VX 



WRITTEN SIMULATION EXERCISES 



STatrniK'n i: of the Problem 



In 'the critical incident study a number of activities relating 
to the processing of information were identified as essential ingre- 
dients of competency in the practice of orthopaedic surgery. The 
following arc illustrative of these critical requirements: 



1, Obtaining adequate information from the patient 

2, Consulting other physicians 

3, Checking other sources 

4, Directing or ordering appropriate' films 
3.' Obtaining . biopsy specimen 

6. Persisting to establish definitive diagnosis 



The study further identified some of the elements involved in the 
ability to make appropriate decisions in complex situations. The 
following arc illustrative: s 

> 

1. Indicating suitable treatment for condition 

2. Treating with regard to special needs 

3. Choos5.ng wisely between simple and radical approach 

4. Delaying therapy until diagnosis better established 

5. Treating most critical needs first 

6. Reassessing, altering or repeating treatment 

However, both the process and the statistical analyses of the commonly 
used oral and written techniques revealed that they did not yield 
adequate assessments of most of these skills and abilities. In the 
clinical setting, the individual problem- solver is usually confronted 
initially with only very limited information, such as the presenting 
complaint of a patient. From this information he must generate a 
hypothesis, gather data and, on the basis of these data generate new 
hypotheses. Sometimes the most important component of competence 
consists in the ability to carry out the process of hypothesis testing 
effectively. Less competent individuals may fail either by coming to 
premature conclu.sions or by refusing to make a decision when the situ- 
ation demands that type of behavior. Furthermore, they may misinterpret 
results, pursue false hypotheses with stubborn persistence or make 
serious errors in judgment about the significant data they need to 
obtain or about the findings they do collect or about the relative 
weight of relevant factors in arriving at*a clecision’i In sharp contras 
with this reality most conventional written examinations provide sig- 
nificant cues to the examinee by presenting him with a limited amount 



of information which is, by definition, adequate for the solution of 
the problem posed--a form of cueing rarely present in real life. To 
avoid this distortion of reality various simulation techniques have 
been devised for assessment of certain aspects of professional compe- 
tence relating to clinical judgment in orthopaedic surgery. 

Descri ption of Writte n Simulations 

The wr it ten simulations developed in the current study employ 

a special answer sheet with an erasable overlay which can be used to 

give immediate feedback to examinees. The problems are then designed 

to require the examinee to make choices from an almost unlimited number 

of broad strategic routes, several of which may lead to an acceptable 

result. ’ • / 

* 

A typical problem is initiated by a brief description in either 
verbal or visual form of the patient's presenting complaint. For 
example, a problem might be introduced with the following statement: 

You. are called to rhe emergency room of a hospital to see 

a 50-year old woman patient who has been rushed to the 

hospital after collapsing at a luncheon a half-hour ago. 

The patient is in severe pain. 

The examinee is then required to select from a number of possi- 
bilities a course of action reflecting his estimate of the seriousness 
and urgency of the situation. The following are illustrative of the 
choices offered at this point : 

You would NOW (Choose only ONE) : 

Obtain further history 
Perform a" physical examination 

Hospitalize patient for further evaluation and therapy 
Prepare patient for urgent surgery 
Etc . 

The examinee records his decision by erasing the epa-que overlay 
from a specially constructed answer sheet. His erasure will reveal 
either instructions directing him to the next section of the problem 
or feedback regarding the results of his decision. If, in the illu- 
stration quoted above the examinee had selected item 3 , "Hospitalize 
patient for further evaluation and therapy", his erasure would reveal 
the words: "Turn to Section F." In Section F he would be confronted 
with an extended list of possible interventions such as those listed 
below, and on erasing would find the results of his orders as shown 
in parentheses* 



1 . 

2 . 

3 . 

4 . 
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In light of the available information you would NOW order 

(Select ns many as you consider indicated ) ; . 

213.~ Hernog 1 ob i. n d e t e r min a t i on (11. 0%gm% ) 

212“ Chest X ray (see X-ray number 72) 

.213- Electrodardiogram (see tracing number 102) 

. Etc. 

On the basis of these new data the examinee is required to make further 
decisions about the next steps in the diagnosis and treatment of this 
patient . 

Each such problem is constructed to allow both for different medical 
approaches and for variation in patient responses appropriate to these 
several approaches. The stages in the work-up and the responses to the 
specific interventions the examinee chooses are meticulously designed to 
simulate the clinical, situation. For example, in response to an order 
for a specific test, a laboratory report is revealed by erasure of the 
, overlay; in response to an order for an X-ray, electroencephalogram, 
electrocardiogram, etc. the examinee is referred to a high quality 
reproduction of the X-ray or tracing; if the student orders a blood 
smear he is referred to a color plate of the smear; if he orders medi- 
cation the patient's response is reported. Even the complications wliic 1 
must be managed differ from person to person depending (as they do in 
the office or clinic) on the unique configuration of prior decisions 
each has made. For some, the erasures will reveal an instruction to 
skip one or more sections of a problem because the approach they have 
chosen is effective in avoiding potential complications with which 
others must cope. If however, at any stage the examinee orders some- 
thing harmful or fails to take measures essential to the recovery of the 
patient, he uncovers a description of the clinical features of the com- 
plication that has developed. He is then directed to a special section 
where he has the opportunity to take heroic measures to rectify his 
previous errors? if the remedial measures are inadequate he may be 
instructed that the problem is terminated because the patient has 
suffered a relapse and has been sent to another hospital or has been 
referred to a consultant, or has died. 

The construction and analyser of these written simulations suggests 
that there are two somewhat distinct types: one, the diagnostic problem 
in which the gathering of information is the predominant element, and 
the other, the treatment problem in which choice among various thera- 
peutic possibilities is the predominant element. The types of decisions 
required in diagnostic problems appear differ from those required in 
treatment problems. In the former, the choice is between doing more o: 
less, whereas in the treatment problems the choice is more likely to be 
between two mutually incompatible courses of action. The consequences 
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of inappropriate actions will therefore also differ in the two types 
of problems. Most relatively short problems are easily identxfie 
as predominantly one type or the other, and . the more extende pro enb 
can usually be divided into treatment and diagnostic sections and 
separate scores can be derived for each type of problem or from each 
section of the more extended problems. 

‘ * ' Scorin g Wri tten S imulations 

The problems are scored by asking a criterion group of subject matter 
specialists to classify each option in the problem as belonging ~o oi-o o. 
the following categories: 



4+ Category: 



4* Category: 



0 Category: 



- Category: 



-- Category: 



Choices which are CLEARLY INDICATED and IMPORTANT 
in the care of TfilS patient at THIS stage in the 
work-up or management; 

Choices which are CLEARLY INDICATED but of a more 
ROUTINE nature, i.e., should be selected but are 
not of special significance in the care of THIS 
patient at THIS stage. 

Choices which are OPTIONAL, i.e., the probability 
that they will be helpful for THIS patient at THIS 
stage is fairly remote or quite debatable; 

Choices which are CLEARLY NOT INDICATED though NOT 
HARMFUL in the management of THIS* patient at THIS 
stage; 

Choices which arc clearly CONTRA- INDICATED (i.e., 
are definitely harmful or carry an unjustifiablely 
high cost in terms of risk, pain or money) in the 
* care of THIS patient at THIS stage , 



Options in the + and 4+ categories are assigned positive weights 
of a magnitude to reflect the importance of the decisions; Similarly, 
procedures the criterion group has identified as contra-indicated are 
given negative weights of varying sizes. The maximum number of points 
obtainable by selecting all indicated procedures and avoiding all useless 
and harmful ones is calculated. The examinee's score is reported as a 
percentage of this maximum. The score is called the Net Score oi 
the "Proficiency Score" md is reported for a test as a whole, or lor 
various types of problems, or for individual problems or even for _ 
sections of problems. Further it should be noted that a given lroficiency 
Score can be achieved in quite different ways. For example, some indivi- 
duals select relatively few indicated procedures while avoiding most conti. 
indicated ones. Their major errors are errors of ommission, a 



characteristic pattern of some practicing physicians. Others select 
most of the indicated items, but also choose numerous contra-indicated 
ones. Their major errors are errors of commission, a pattern which 
is quite common for neophytes, such as medical students. For all 
special studies of written simulation the following scores have been 
calculated : 

Score on Proficiency ^ *.,p’he Total Test 

Score on Selection of Indicated Procedure s UK Che Diagnostic Problems 

Score on Avoidance of Contra-Indicated Procedures^ (The Treatment Problems 



Re 11 ability of Written Simulations 



The written simulation exercises are 'so unconv?ntional in form ' 
that they pose new problems in defining and computing reliability? 

In estimating the reliability of these exercises, reliability has been 
defined as the amount of error involved in generalizing from the results 
to some universe; the major issue then becomes the definition of the 
universe to which one wishes to generalize. All studies done to date • 
strongly suggest that even one fairly lengthy simulation is highly re- 
liable if one wishes to generalize to a universe of s imila r problems 
dealing with s imila r disease entities. However, if one wishes to 
generalize about some global ability, such an "clinical judgement," 
for example, then it is necessary to use several simulations to achieve 
a reasonably reliable estimate. This is explained by the fact that 
only modest correlations are obtained between scores on different 
simulation problems, for the same reasons that in "real life" lead to 
superb physician performance with one patient and only mediocre or even 
poor performance with others . 



In estimating the reliability of simulation exercises for purposes 
of generalizing to .a universe of similar problems, sampling similar 
components of competence, a technique analogous to the splic-half method 
has been employed in this study. * This technique is based on the 
assumption that an individual has equal opportunity to select all items. 
This assumption is appropriate for diagnostic problems since it is pos- 
sible for an examinee to make independent choices. However, in treat- 
ment problems, one choice often precludes other choices, consequently, 
the technique is inappropriate for most treatment problems, for which 
some form of analysis of variance must be used. Unfortunately, to 
date, no orthopedic examination has included a sufficient number of 
treatment problems to permit effective application of analysis of variance 
For this reason, techniques of estimating reliability from corrected 
correlations between part test and total test scores have been utilized 
in this study to estimate the reliability of both diagnostic and treat- 
ment problems despite the fact that other evidence suggests that 




The specific method employed in this study involved obtaining the 
correlation between scores on every third item versus scores on the 
total test and correcting with Angoff Formula 12. 
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such a method results in a serious underestimate of the reliability of 
treatment problems . 



The results of the various studies of reliability of written sim- 
ulations are shown in Table 22 which reveals that most of the diagnostic- 
problems achieve reliabilities in excess of .90, where one is attempt- 
ing to generalize tp a universe of problems similar in both content and 
process. Thus, we can be reasonably certain from the data that examin- 
ees who fail in a simulation exercise to diagnose a Charcot hip, for 
example, genuinely failed to handle that problem adequately, and that 
the results are not due to accidental factors. One cannot conclude, 
however, that such examinees would fail to make an accurate diagnosis 
in a clinical problem involving some, other .diagnostic entity; nor 
can one conclude that failure in one problem indicates that the physi- 
cian is a poor diagnostician. Estimates of the latter must be based on 
analysis of ' performance on a large number of problems, and in a variety 
of' settings. 

' Validity of Simulation E xercises . 



Conte n t Validity 



Though the written exercises were designed to simulate reality, 
they cannot duplicate it for the reasons that: 



1. Confrontation with the real patient whose manner, appearance 
and physical reactions provide many clues, both helpful and 
distracting, is eliminated. 



2 . 



The exercises necessarily involve compression and distortion 
of time scale which may lc-.d to exploration of blind alleys 
longer than is likely in the clinic or ward. 



3 . 



The examination format may impose an arbitrary pattern of 
exploration and intervention. 



4 . 



Real life pressures (e.g., a waiting room full of patients) 
are eliminated. 



5 . 



As with other examination formats, this may be preceived as requir^ 
ing the ex anu. nee to anticipate what those who have constructed 
the examination are looking for, rather than to consider only 
what is best for the patient and convenient to himself. 



Such challenges require thoughtful study of at least two hypotheses 
which underlie both written and oral simulations: 



o 
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TABLE 22 



RELIABILITY OF PROFICIENCY SCORES ON SIMULATION EXERCISES 



May 1965 
Certification 
Exam - I 



EXAMINATION 

PROGRAM 



‘■’•‘Nov 1965 
In- Training 
Examination 



*Jan 1966 
Certification 
Examination 
Candidates 

*Exciminers 



ay 1966 
Certification 
gx am inaj t i o ri - - I 
Nov. 1966 
In-Training 
Examination 
First Year 

Second Year 
Residents 



Third Year 
Residents 



Fourth Year 
Residents 



N 


PROBLEM. 

NO. 


NATURE OF 
PROBLEM 


LENGTH OF 
TIME FOR 
PROBLEM 


RELIABILITY 
COMPUTED BY 
ANGOFF 12 


408 


I 


Diagnostic 


45 Min. 
< 


.86 


1495 


I 


Diagnostic 


15 Min. 


.53 




II 


Treatment 


15 Min. 


.0 


402 


I 


Diagnostic 


15 Min. 


.57 


• 


II 


Treatment 


15 Min. 


-.38 




III-A 


Diagnostic 


15 Min. 


.97 




B 


Diagnostic 


15 Min. 


.91 


* 


C 


Treatment 




J5 0 


184 


I 


Diagnostic 


15 Min. 


.32 




II 


Treatment 


15 Min. 


-.32 




III 


Diagnos tic 


15 Min. 


• 98 




III 


Diagnostic 


15 Min. 


.91 




III 


Treatment 


15 Min. 


.50 ... . 


450 ’ 


I . 


Diagnostic 


15 Min. 


.91 




I 


Treatment 


15 Min. 


-.06 


* 


I 


_ Total 


_ .30 J4in. 


jJl 


~256~ 


~1 


Treatment 


15 Min. 


.00 




I 


Diagnostic 


15 Min. 


.97 




II 


Treatment 


15 Min. 


.56 




II 


Diagnostic 


15 Min. 


.81 




_ .1+11 


_ Total 


_ j60_Min ^ 


^90 _ _ 


46 4 


I 


Treatment 


15 Min. 


.23 




I 


Diagnostic 


15 Min. 


.97 




II 


Treatment 


15 Min. 


.40 


. 


II 


Diagnostic 


15 Min. 


.80 




_ I+JEI 


„ Total 


_ 0_0_Mgn^ 


^89 _ 


~345~ ’ 


~I 


Treatment 


15 Min. 


.36 




II 


Diagnostic 


15 Min. 


.97 




II 


Treatment 


15 Min. 


.23 




II 


Diagnostic 


15 Min. 


.76 




_ I+JCI 


_ To tag 


60_Min. 


_ _ JB 9 


~390~ 


**1 


Treatment 


15 Min. 


.56 | 




II 


Diagnostic 


15 Min. 


.97 I 

| 


* 


II 


Treatment 


15 Min. 


.39 ] 




II 


Diagnostic 


15 Min. 


.82 j 




1+ II 


Total 


60 Min. 


.91 
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EXAMINATION 

PROGRAM 


N 


PROBLEM 
NO „ 


NATURE OP 
PROBLEM 


LENGTH OP 
TIME FOR 
PROBLEM _ 


RELIABILITY 
COMPUTED BY 
ANGOFP 12 


Jan. 1968 


575 ** 


I 


Treatment 


15 Min. 


.00 


Final 




I 


Diagnostic 


1 5 Mi n . 


.89 


C a r t i f i c a t i on 




II 


Treatment 


15 Min. 


-.19 


Examination 




II 


Diagnostic 


15 Min. 


.76 


Candidates 




1+ II 


Total 


60 Min. 


.60 



* NOTE i The two problems on the November 1965 In-Training Examination 

were given in slightly altered form to the candidates and exam- 
iners in the 1966 Pinal Certification Examination. 



** Sub-sarople composed of all of graduates of U.S. medical schooJ.s taking 
the OCE for the first time. 







67 



1. . That the mental processes involved in working through 
a simulation exercise are sufficiently similar to those 
required in effective clinical practice that data on the 
former will provide valuable information on the latter; 
and 



2. • That the individual’s approach to simulation problems 
reveals attitudes which are characteristic of his 
management of actual clinical problems. 

Evidence relevant to these hypotheses is ’ presented below in the dis- 
cussion^ of construct and concurrent validity. 



Construct Validity 



Several approaches to the. analysis of the construct validity of 
written simulations have been employed in this study. The first en- 
tails investigation of the relationship between performance and such 
background variables as amount of training ? age, and practice setting. 
Three hypotheses were considered in this part of the study. 

(1) That increased training would be associated with higher 

- proficiency scores and that this growth would be pre- 
dominantly in therapeutic decision-making rather 

than in diagnostic thoroughness. 

(2) That beyond a certain point , increasing age would be 
associated with lower proficiency scores and that this 

■ decline in per formance ■ would be manifest primarily in 
a greater tendency to take diagnostic shortcuts. 

(3) That, since the problems had been constructed and scored 
primarily by physicians in academic settings, according to 
their value systems, clinicians in those settings would per* 
form better than physicians practicing in other settings. 



Evidence relevant to Hypothesis (1) was collected in the In- 
Training Examinations. Table 23 which summarizes the data on written 
simulations from the last; three such examinations, indicates that there 
was no significant difference in proficiency scores on diagnostic 
problems between first and fourth year residents whereas, with one ex- 
ception, these two groups differ significantly on proficiency scores 
on treatment problems. In the one exception, the treatment score was 
strongly linked to the diagnostic score on that particular problem. 
These data lead to the conclusion that increased training seems to be 



o 
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associated with increased ability, 
make effective treatment decisions 
ability to gather information for 



(as measured by these problems) to 
without concomitant growth in the 
purposes of solving diagnostic problems. 



TABLE 23 

Relation Between Level of Training 
and Proficiency Scores* 



Examination 


Problem 

No. 


4 

Type of 
Problem 


Mean Proficiency Score 
of Residents Ins 


Difference 
Between 
First and 
Fourth Year 


First 

Year 


Second 

Year 


Third 

Year 


Fourth 

Year 


Nov. 1965 


I 


Diagnostic 


73 


71 


74 


75 


+ 2 


In-Trainixig 


II 


Treatment 


28 


31 


31 


35 


-1- 7* 


Examination 






258 


309 

\ 


369 


430 




Nov. 1966 


I 


Diagnostic 


62 


63 


55 


59 


- 3 


In- Training 


II 


Treatment 


-10 


- 7 


- 7 


- 4 


+ 6* 


Examination 


: ii 


Diagnostic 


7 


7 


6 


5 


- 2 




ii 


Treatment 


17 


20 


20 


15 


- 2 








456 


531 


345 


390 




Nov. 1967 


i+ii+m 


Diagnostic 


36 


36 


36 


36 


0 


In- Train- 


i+ii+m 


Treatment 


43 


46 


52 


54 


+11* 


ing Exam- 






244 


513 


499 


399 


• 


ination 

















f All scores are expressed as raw .scores. 



* Significant at .05 level of confidence. 

Insight into the possible explanations of these phenomena was obtained 
from a study of the responses of residents, candidates and examiners to 
similar problems. These data, summarized in Table 24, suggest that those 
with more experience tend to be more willing to trust their judgement 
and to take unavoidable, radical action earlier than less experienced 
physicians . 



o 



TABLE 24 



COMPARISON OF MANAGEMENT DECISIONS AMONG SELECTED GROUPS 
1966 In-Training and Certification Examinations 






Re c omm ended Act i on 


Percentage of Each Group 
Selecting the Option 


Residents 

0=1366) 


• Candidates 
(N=403) 


Examiners 

(N=184) 


Amputate, * first opportunity 


10 ; 


20 


28 


Amputate, * later 


26 


20 


22 


Total 


36 


40 


50 



* Amput a t i on was t he optimal course of act i on . 



Data ‘on hypotheses (2) and (3) were obtained from the 1966 Final 
Certification Examination in which identical written simulation exer- 
cises were administered to both candidates and examiners, and the re- 
sponses of the latter group were further analyzed according to age and 
academic affiliation. The results, summarized in Table 25, reveal that, 
among the examiner group, increasing age was associated with lower scores 
on both treatment and diagnostic problems and that, while the younger 
examiners showed a marked superiority to the candidates on the treatment 
problems, no such superiority was manifest on the diagnostic problems. 

Table 26 suggests one possible explanation: In the diagnostic work-up, 

examiners seek substantially less information than candidates, and as. 
indicated in Table '27, this tendency among examiners is exacerbated with 
increasing age, a finding similar to those noted in the observational 
studies of Peterson and Clute'l' 5 Finally, as shown in Table 25, the re- 
sponses of full-time academicians on the written simulations were in 
much closer agreement with the criterion group than were the responses 
of examiners from other practice settings, m short, the experienced physician isl 
more likely to take diagnostic shortcuts and is more willing to take de- 
cisive action in treatment; among the experienced physicians, those who 
practice in academic settings are, not surprisingly, more likely to be- 
have according to standards established by criterion groups which are 
heavily weighted with academicians . 



In summary, the data from studies of the construct validity of the 
written simulations are encouraging in that differences in the responses 
of various groups are in the expected direction and the patterns of re- 
sponse are closely similar to those reported in observational studies of 
physician performance in clinical settings. 




** Differences significant at .01 lev 
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TABLE 26 



ANALYSIS OF STRATEGIES IN GATHERING 
INFORMATION ON WRITTEN SIMULATIONS 



1966 Final 



Certification Examination, 



Problem III 



Procedure 


Percents g e Sol e c t ing 
as the Initial 
Procedure 


Tot a 1 Percent age 
Selecti Procedure 

at Any Point 


Candi- 

dates 


Exam- 

iners 


Candi- 

dates 


Exam- 

iners 


* Obtain a history 


61 


42 


67 


« 

50 


Obtain a physical examination 


8 


13 


66 


57 


Obtain laboratory and X-ray data 


10 


12 


64 


48 


Obtain a biopsy 


19 


29 


95 


96 


Initiate treatment 


2 


4 


92 


92 



* Optimum choice according to criterion group 




TABLE 27 



EXAMINER'S PROFICIENCY SCORES ON HISTORY 
AND PHYSICAL EXAMINATION SECTION OF WRITTEN 

SIMULATIONS 



1966 Final Certification Examination, Problem III 



Age Group 


N 


Mean % 


Standard Deviation 


1 s 

Under 40 


16 


38.3 


31.1 


41-45 


44 


25.3 


25.2 


46-50 


36 


22.1 


24 . 4 


51-55 


32 


19.8 


23.2 


56-60 


33 


15.1 


20.7 


over 60 


18 


1.3 


3.8 




ii 
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Evidence regcirding the construct validity of written simulation 
exercise was also obtained from correlational and factor analytic 
studies of the relationships between performance on the simulation 
exercises and that on other types of test exercises. Since the written 
simulation exercises had been designed to measure aspects of competence 
not sampled by more conventional devices it was predicted: (1) that 

there would be relatively low correlations between scores on written, 
simulation exercises and scores on other tests and (2) that the factor 
structure of scores on written simulations would differ from that of 
scores on other tests. 



Data relevant to the first hypothesis are presented in Table 28. 

These delta, as accumulated from the administration of a number of 
problems in several different tests given to different groups of 
examinees, are consistent: The correlation between score? on written simu- 
lation exercises and scores on other evaluative techniques is .in no 
case high and in most cases does not differ from zero. The data suggest 
that not more than 10%~2Q% of the variance is common to all techniques 
and that this common variance can probably ‘be attributed to a common 
informational base requisite to performance on .any of the tests. Beyond 
that, the orals, the simulation exercises and the multiple choice 
questions .appear to be measuring somewhat different aspects of competence. 

Data relevant to the second hypothesis are presented in Tables 6 
and 7 above which summarize tiie more significant results obtained ....... . .. . . 

in the two factor analytic studies that have been conducted to date. In 
the first study as summarized in Table 6 three independent factors emerged: 
The multiple choice examination and conventional orals loaded on one factor; 
written simulations of the diagnostic type loaded heavily on a second 
factor; and written simulations of the treatment type loaded moderately 
on a third factor. The second study, summarized in Table 7, revealed a 
somewhat more complex, but essentially similar factor structure in the 
1968 Final Certifying Examination. In short, both yield data comput- 
able with the second hypothesis stated above. 

Concurrent-Validity 

The concurrent validity of the written simulation exercises was 
investigated by means of correlational and multiple regression analyses 
in which total and sub-scores on the exercises were used together with 
other test variables to predict supervisor's ratings of the habitual 
performance of residents. In the first such study, conducted on the 
1966 In-Training Examination, residents in the first 2 years of the 
training program and those in the last 2 years were treated as different 
populations since it was felt that supervisors would use different 
criteria in evaluating the two groups. The results, as summarized in 
Table 29 and detailed in Appendi x 14 , indicate that the total score on 
the Written Simulation Exercises makes no significant contributions to 
the prediction of supervisors' ratings of any aspect of habitual perfor- 
mance. However, the picture differs markedly when sub-score on th<p 
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TABLE 28 

CORRELATIONS BETWEEN WRITTEN SIMULATION PROFICIENCY SCORES 
AND SCORES ON OTHER EVALUATION TECHNUQIES 



PROR- TYPES OF SCORES 

LEM 

NO. 


ADULT 

CONVENTIONAL 
ORAL (h Hr) 


PROBLEM 
SOLVING 
ORAL (15 MIN) 


SIMULATION 

ORAL 

(10 MIN) 


MULTIPLE 

CHOICE 


January 1966 Final Cer t 


Exam N-383 


i 






I Diagnosis (Lab) 


.14 


.05 


.05 


.24 


II Treatment 


-.03 


-.07 


-.01 


.07 


HI [Diagnosis 




\ 






(Hist. & Phys.) 


.05 


' 1 12 


.06 


.10 


Diagnosis (Lab) 


.18 


.10 


.11 


.18 


Treatment 


.03 


.00 


.06 


.08 


May 1966 Cer t. Exam I N : 


=408 








I Diagnosis 


- 


— 


- 


.16 


II Treatment 


- 


- 


- 


.18 


I Total 

$ 


— 


~ 


- 


.21 


November 1966 In Training Exam First Two 


Years N=109 






IK Treatment 


.08 


-.02 


-.01 


.01 


I Diagnosis 


-.05 


+ « 24 


+ .12 


.07 


II. Treatment 


-.08 


-.05 


-.07 


..07 


II Diagnosis 


-.19 


+ .10 


+ .17 


.06 


I&II Total 


-.06 


.11 


-.01 


.05 


November 1966 In-Training Exam Second Twc 


i Years N-109 






I Treatment 


.19 


.07 


-.00 


.23 


I Diagnosis • . 


-.14 


.04 


-.00 


.23 


11 Treatment 


+ .14 


-.06 


-.01 


-.05 


II Diagnosis 


-.17 


.07 


-.15 


-.04 


. Total 


-.06 


.05 


-.04 


.18 


January 1967 Cert Exam ] 


['7=407 








I Treatment 


.01 


.02 


.05 


.08 


November 1967 In-Training Exam N-1682 








I+II+III Treatment 


- 


- 


- 


.27 


I+II+III Diagnosis 


— 


— 


— 


.29 


January 1968 Final Cert 


Examination n=784 


* 






I+II Diagnosis 


— 


.16 


.02 


.31 


I+II Treatment 


• 


.20 


.07 


.24. 


I+II Total 


— 


.23 


.04 


.35 



*NOTE: In the January 1968 Examination the Problem Solving Scores are based 

o on a combination of 4 half-hour orals. 

ERIC 



74 



simulation exorcises are considered. For example, for residents in the 
first two years of training, scores on the avoidance of contra-indicated 
procedures in diagnostic problems are negatively correlated with super- 
visors' ratings. That is, individuals who ash numerous irrelevant ques- 
tions and who select many contra-indicated procedures seem to be more 
inquisitive and more thorough though probably less well-informed than 
others in their diagnostic inquiries , and it appears that the chief of 
training values curiosity and thoroughness and tends to disregard lack 
of information on the part of the relatively inexperienced residents. 

In contrast, for residents in the last two years of training scores on 
the avoidance of contra-indicated procedures in. diagnostic problems 
were positively correlated with supervisor s * ratings, probably a re- 
flection of the differences in standards training chiefs apply to 
neophytes and to experienced practitioners.. 



Such data raised so many puzzling questions that the concurrent 
validation study of the 1966 In-Training Examination was replicated 
on the 1968 Final Certifying Examination. The results are summarized 
in Table 30 and detailed in Appendix 15 ; .They suggest, as did the 
study of the 1966 In-Training Examination, that several different 
types of tests are needed to predict competence as defined by super- 
visors, since in cill cases the three best predictor included scores 
from both Written and oral exercises. Second, as would be expected, 
the several predictor variables are dif f erentcilly useful in predicting 
different criterion behaviors: For example, the partial correlation 
of the Multiple Choice Recall score with supervisors' ratings of 
Information Gathering Behavior is substantially higher than that with 
ratings of Effectiveness in Emergency Care. Third, certain of the sub- 
scores in the simulation exercises appear to make a useful contribution 
to the prediction of supervisors' ratings of certain types of affective 
behavior, for example, Effectiveness in Colleague and in Patient Rela- 
tionships, in assuming Continuing Responsibility and in providing 
Emergency Care. In short, these data suggest that the written Simula- • 
tions may be measuring certain styles and attitude sets as well as 
sampling purely cognitive behavior. 



Summary 



In summary, studies of the reliability and validity of the written 
simulation exercises suggest that such problems do require the examinees 
to demonstrate certain types of behavior similar to that required of them 
in clinical settings. However, the ability to handle such problems differs 
markedly from one problem to another depending in part on the content of 
the problem, hence, several problems are required to obtain reasonably 
reliable results. Nevertheless, despite limited generali.zability across 
problems, scores based on even a few problems./ when combined with data 
from other evaluation techniques, make a significant contribution to 
the assessment of competence in orthopaedics. Secondly, such exercises 
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appear to sample some types of behavior (e.g. diagnostic thoroughness) 
not easily observed by superviosrs and not generally rewarded in training 
programs. Third, exercises of this type appear to sample certain affec. 
ive°components of c.omp].etence not readily measured with conventional 
cognitive techniques . 

Though further work is required to develop optimum methods of 
constructing and scoring simulation exercises ^ they appear to be making . 
a sufficiently reliable and valid contribution to the more convprehcnsive 
assessment of professional competence to justify their expanded use. 



1. R. Damrin, Glaser, etal, ’’The Tab Item: A Technique for the Measure 

ment of Proficiency in the Problem Solving Task”, A. A. Lumsdaine 
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A Source Book, Washington, NEA , 1960, pp, 275-285. 



•2. Christine McGuire and David Babbott , "Simulation Technique .in the 
Measurement of Problem Solving Skills”, J j 2 u . rna A. of Educationa l 
Measurement , Spring, 1967, pp . 2-12. 

3 Arieh Lewy and Christine McGuire, "A Study of Alternate Approaches 
in Estimating the Reliability of Unconventional Tests , Read at 
Annual meeting of the AERA, Feb. 18, 1966. 

4. Kenneth Clute, The General Practf onler , (Toronto) University of 
Torondo Press, 1963. 

5 Osier Peterson, etal, "An Analytical Study of North Carolina 

General Practice", 1953-54, Journal of Medical Ed ucation, 1956, ■ 
31, No. 12 (whole part 2). 
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TABLE 2 9 "A 



RESULTS OF MULTIPLE CORRELATIONAL ANALYSIS 
USING RATING FACTORS AS DEPENDENT 
. VARIABLES AND TEST SUBSCORES AS INDEPENDENT VARIABLES 

i 

1966 In-Training Examination 



FIRST AND SECOND YEAR RESIDENTS 

N = 109 



* Significant at *05 level 
** Significant at .01 level 



ERjt 



Dependent 

Variables: 

Rating 

Factors 


R 


F 


w.-fijrw ^ rvi^.r. w ’mi ?r nr-.* ; z v; r n:..: z&xrsv&Zi’t s *^r.5TL-5^r* .ro,vr 

Independent 

Variables 

Test 

Scores 


partial 

r 

• 


f5,ur.\r/:.rr wssre rr > 

F 


•Factual 

Information 


***** W J 

.53 


1.83** 


* 

Problem I, Written Simulation 
Score on Avoidance of Contra- 
Indicated Diagnostic 
Procedures 


.28 


7.84** 


k 






Proposed Treatment Interview 
Overall Score 


• 

to 

CO 


7.42** 






• 


Proposed Treatment Interview 
Interaction Score 


-.26 


6.46* 


• 




. 


Problem II. Written Simulation 
Score on Avoidance of Contra- 
Indicated Treatment Procedures 


.24 


5.20* 






' 


Problem I, Written Simulation 
Score on Selection of Indicated 
Diagnostic Procedures 


! 

• 

ro 

ro 


4.71* 


Tr JL,1J ' J - r,J " **-* lunrrt 1 *- Li ' ,_r 

Problem 

Solving 






None Significant 






Information 

Gathering 






None Significant 


• 





ethics 

i 






None Significant 


► 
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TABLE 29—13 

RESULTS OF MULTIPLE CORRELATIONAL ANALYSIS 
USING RATING FACTORS AS DEPENDENT 
VARIABLES AND TEST SUBSCORES AS INDEPENDENT VARIABLES 

1966 In-Training Examination 



Dependent 
Variables : 
Rating 
Factors 



Factual 

Information 



Problem 

Solving 



R 



56 



Information 

Gathering 



Ethics 






. 63 



THIRD AND FOURTH YEAR RESIDENTS 

N = 119 



55 



1.85* 



Independent 

Variables 

Test 

Scores 



None Significant 



2.70* 



Adult Oral Overall 

Problem I, Written Simulation 
Score on Avoidance of Contra- 
Indicated Diagnostic Procedures 



2 . 01 * 



Diagnostic Interview Subscore on 
Diagnosis 

problem I , Written Simulation 
Score on Avoidance of Contra- 
Indicated Treatment Procedures 



Proposed Treatment Interview 
Subscore on Manner 

Problem II, Written Simulation 
Score on Selection of Indicated 
Treatment Procedures 

Problem I, Written Simulation 
Score on Selection of Indicated 
Diagnostic Procedures 



~r 



partial 

r 


F 






.22 


4.87* 


• 

ro 

1 o 

i 


4.02* 


i 

• 

to 

H* 


4.41* 


.20 


3.85 


-.31 


9.14** 


CO 

CNJ 

• 


7.28** 


-.21 


4.0" ’• 

t 



* Significant at .05 level 
* Significant at .01 level 
ERIC 
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rH 

<D 

> 

0 

r~J 

lO 

O 



P 

CO 

4J 

P5 

CO 

o 

•H 

•H 

0 

60 

CO 

* 







^Significant at .01 level 
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CHAPTER VII 
ORAL EXAMINATIONS 



Conventional oral examinations are increasingly subject to the 
often legitimate criticisms that they are inherently unreliable due 
to both sampling and rating errors, that they are frequently invalid 
because they are not designed to evaluate a number of important areas 
of competence, that the aspects of competence which they do evaluate 
cannot be precisely determined because they are, so unstructured and 
unstandardized and that they are unduly expensive in terms of examiner 
time and administrative effort. Despite these defects the oral exami- 
nation has persisted as an evaluation technique in those situations where 
the limited number of examinees permit, primarily because the stubborn 
conviction prevails that the oral examination measures some ill-defined 
aspects of performance not measured by other means, and because the 
oral preserves some element of personal contact as a basis for making 
an important decision about an individual and also provides the examiner 
with a feeling of participation in the evaluation process difficult to 
obtain with other methods. Consequently , in restructuring the traditional 
oral to avoid the defects and capitalize on the benefits cited above, 
two considerations were paramount in the present study. The first was 
to utilize the opportunity afforded by an oral examination to sample such- 
interpersonal skills as ability to relate to and communicate with patients 
and colleagues. The second was to take advantage of opportunities for 
examiner- examinee dialogue to sample the higher level cognitive processes 
entailed in interpretation and problem-solving. These considerations 
led to the development of three quite different examination formats, 
each deserving of separate analysis. 



The Overall Design of the New Oral Examination s 

Previous experience with written simulations had confirmed numerous 
advantages in sampling complex cognitive processes with exercises that 
require the examinee to make decisions shiilar to those required in real 
life; however, such exorcises in written form have certain limitations. 
Specifically, they permit the examinee to select from a list of possible 
responses rather than requiring him to generate his own inquiries; in so 
doing they impose certain strictures on the examinee^ mode of inquiry 
that are not present in real life, while relieving him him of certain 
pressures toward efficiency that are characteristic of the clinical 
situation; finally, in a written exercise, it is always possible to 
know what a person chooses but often impossible to determine why he 
makes a particular choice,, 
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It appeared to 
could be developed 
characteristics of 



the study staff that problem- solving 
in an oral format that retained the es 
the written simulation exercises while 



exercises 

sential 

avoiding 



their limitations. 



In the first such exercise, the Diagnostic Interview, the exam- 
inee was given a brief description of a patient’s presenting 
complaint. It was his task to question the examiner, who was programmed 
with the details of the case, in order to arrive at a diagnosis. During 
the history-taking part of the inquiry the examinee played the role 
of physician while the examiner played the role 'of a patient; no role- 
playing was involved in the inquiry regarding physical and laboratory, 
findings. The examinee was allowed 12 minutes to gather the information 
and three minutes to explain his diagnostic, impressions. A second > 
examiner observed the proceedings but did not participate . 



This exercise, together with two other role-playing exercises designed 
to evaluate ability to relate to and communicate with patients and 
with colleagues, was first administered experimentally in January 1966 
along with four conventional oral examinations . In order to maximize 
the validity and reliability of these experimental orals, standardized 
case materials were prepared, an objectified rating form was developed 
(see Appendix _10_ ) and all participating examiners were asked to attend 
a two day training program to prepare them for 1 administering and scoring 
the new examinations . 

Analysis of the results of these experimental examinations together 
with those obtained from a second experimental scries conducted in 1967 
led to the following restructuring of the oral component of the 1968 
Final Certification Examination: The 2% hours alloted to the orals was 

divided into 5 half-hour examinations one of which was designed to sample 
skills in relating effectively to patients and colleagues, one . to sample 
interpretive skills -and the remaining 3 to sample problem- solving skills . 

Of the three problem-solving orals, one was devoted to problems of 
adult orthopedics, one to problems of children’s orthopedics and one to 

problems of trauma . 



The half-hour designed to sample skills in relating. to patients. and 
colleagues consisted of 3-4 role-playing exercises in which the candidate 
took the role of physician and the examiner took the role of patient or 
colleague in a specifically described situation typical of the problems 
the physician encounters in working with others. 
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The nature of the half-hour oral designed to sample interpretive 
skills is probably best conveyed by the ins trxic’t ions to examiners 
shown in the exhibit on the following page. During this exercise 
the candidate was presented with 3-4 sets of slides and/or X-rays, 
each set relating to a single case. It was his task to descr ibe the 
f in d ing s present. 

Each of the half-hour orals devoted to problem-solving 
consisted of 2-3 problems of the following types: 

1. The Diagnos t ic Pro blem: This type of problem resembled the 

Diagnostic Interview described previously with the single 

exception that role playing was entirely eliminated as included 

in the history- taking portion of the earlier format. 

« 

2 . The D efense of Therapy Pro bl em: In this type of problem the 

examinee was presented with a brief description of a specific 
clinical case; it was his task to outline in detail the plan 

. of management he would recommend; it was the examiner’s task 

to ask probing questions that would require the candidate to 
explain his rationale and to defend his decisions. 

3 . The Eme r gency Treatment Problem: In this type of problem the 

•examinee was presented with a brief description of a specific 

emergency case; it was his task to detail the steps he would 
. . . take and to indicate his priorities in that particular emergency , 

while the examiner provided feedback regarding the consequences 
of each action recommended by the examinee. 

. • 4 . The Comp 1 i c a t i on Pr ob 1 em : In this type of problem the examinee 

was presented with a brief description of a specific case 
representing a problem of chronic illness. As in the Emergency 
Treatment Problem it was his task to detail the steps he would 
take while the examiner provided feedback regarding the conse- 
' quences of each action he recommended. 

The first two types of problems were employed in the problem-solving 
orals on Adult and Children Orthopedics, and the latter two types in 
that on Trauma . 

In all 5 oral examinations the candidate was rated on 4 separate 
factors, i.e. components of competence; these were recall of factual 
information, analysis and interpretation of clinical data, problem- 
solving ability and professional attitudes as defined in Table 31. 

While the same factors were considered in scoring all examinations, the 
value assigned each factor varied among the 5 orals so as to place maximum 
weight on the one or more aspects of competence each type of oral was. 
specifically designed to sample. 
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EXHIBIT IV 



Administration of the 
I nter pre tation E> 



Observation 
l ami nation 



and 



You should first 
him to describe 
r e p or t , ind i c a t i 



present the material to the candidate and instruct 
what he sees precisely as he might in a written 
ng any abnormalities that may be present. 



Tf the candidate fails to interpret the material properly you might 
supply him with •some additional historical, physical examination or 
laboratory data which would assist him. You should not provide this 
additional information until AFTER he has initially described his 
findings. If he identifies some abnormalities you should then ask 
additional questions which would probe his ability to interpret what 
he sees in the light of his knowledge and understanding of physiological 
and pathological processes. For example, you might ask him to speculate 
as to the reason that the structures on the slide show the pattern they 
do, or you might ask the probable effect on the abnormality of various 
types of treatment . 



DO NOT SPEND TOO MUCH TIME ON ANY ONE EXERCISE. 

Remember, you aie mainly concerned with what the candidate sees and 
how he interprets it. If you ask too many questions about diagnosis 
the candidate may simply answer them on the basis of his basic infor- 
mation and not on the basis of any observational skills. Although we 
recognize that it is extremely important to assess the candidate's 
basic store of information on diagnosis and treatment, this area of 
competence is being probed thoroughly in other portions of the examina- 
tion. 
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TABLE 



31 



Oral 



Ex a m in a t i on Ra t in g 



F o rnv • Ex p 1 an a t i on 



of 



Factors 



Factor 1 Recall of Factual Information 



This factor deals with the candidate's factual knowledge of 
general medicine and orthopaedics as displayed by his ability 
to discuss the cases during the examination. Note that can- 
didates can score high on this factor, but still do badly 
in Factor 111 because they have difficulty in integrating the 
information they possess. If you believe that the factual 
content of the examination is too simple to allow for any 
judgment of the store of information of the candidate, check 
"Unable to Judge" rather than "Good" or "Excellent". 



Factor II Analysis and Interpretation of Clinical Data 

This factor deals with the candidate's ability to perceive 
the characteristics/ both normal and abnormal/ of material 
presented to him in visual form feuch as X-rays, slides, 
motion pictures, photographs, etc.) and to explain what he 
has seen. 



Factor III Problem-Solving Ability — Clinical Judgment 

This factor deals with the candidate's ability to use the in- 
formation he has to make appropriate decisions in patient 
diagnosis and treatment as displayed by the data he solicits 
about patients, the diagnostic and therapeutic conclusions 
he comes to, his ability to provide a rationale for the deci- 
sions • he makes . 



Factor IV Relates Effectively — Shows Desirable Attitudes 

This factor deals with the ability of the candidate both in 
statements and in manner to communicate effectively and con- 
vey genuine concern for patients, respect for colleagues and 
an understanding of the ethical responsibilities of a physician 
in his relationships with others. 
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In orcl »r to ensure that all examinations were administered and 
scored as plained, detailed instructions to examine/, s and to candidates 
were pro posed cr.d'lining the procedures to be followed (see Appendices 
16 and ._7_) : Secondly, as in the previous experimental orals, training 
sessions were conducted for all examiners and standardized case materials 
were supplied to each. Finally, to ensure maximum object. 5 \zity and 
reliability of the orals, examiners were instructed to utilize a 
12- point scale in rating candidate performance',' and to record their 
judgments on the special form reproduced below. ( 



ORAL EXAMINATION RATING FORM 



I 


Unable To 
Evaluate 


Definite 

Failure 


Mat 


•ginctl 


Good 




Excellei 


ni 


Recall of Factual 


□ 


G 

□ 


□ 


n 


□ 


□ 


□ 


LI 


□ 


□ 


n 


□ 


Information 


00 


01 02 


03 


04 


05 


06 


07 


03 


09 


10 


1 1 


12 


Analysis and Interpretation 


n 


□ 

□ 


□ 


n 


P 


□ 


□ 


□ 


n 


□ 


□ 


n 


of Clinical Data 


00 


01 02 


03 


04 


05 


06 , 


07 


03 


09 


10 


1 1 


1 2 


Problem-Solving Ability; 


□ 


□ □ 


□ 


□ 


□ 


□ 


□ 


□ 


□ 


□ 
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□ 


Clinical Judgment 


00 


01 02 


03 


04 


05 


06 


07 


08 


09 


10 


u 


12 


Relates Effectively; 


□ 


□ 

□ 


□ 


a 


P 


□ 


□ 


□ 


□ 


□ 


n 


□ 
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08 


09 


10 


1 1 
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Data on the reliability and validity of each of the new oral 
examinations are summarized in the succeeding sections of this chapter. 



^Analogous factor scores derived from the multiple choice and 
written simulations were converted to the same 12“ point scale 
to facilitate comparison of factor scores across techniques 
and computation of a composite score on each factor. 
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The Di agnos tic Interview 



The interrater reliability of the Diagnostic Interview was 
assessed by having two examiners independently rate an examinee 
on the same fifteen minute interview. The results of these studies, 
shewn in Table 32A, indicate high inter-rater agreement, especially 
in the light of the limited size of the observational sample. How- 
ever, prior experience with written simulations' suggested that ad- 
equate reliability across cases would be much harder to achieve, 
and .in the sole study of the combined effects of error, due both 
to examiner disagreement and to variation .in examinee ability to 
handle different cases, the estimated coefficient of reliability 
was very low -- .25 (see Table 32B) . Since mos* of the error ap- 
peared to be attributable to restrictions in the content sampled, 
it was concluded, that, if the oral examinations were to be useful 
evaluation techniques, it would be necessary to increase the number 
of cases to which each candidate was exposed and to pool the data 
from all examiners and all problems. This procedure was followed 
in the 196,8 Final Certification Examination with the results report- 
ed below. 

The O ral Tests o f Comp lex Cognitive Ski l ls as Revised 



Due to administrative complications, it was impossible to ob- . 
tain estimates of interrater reliability of the oral problem-solving 
and interpretation exercises because only one examiner was available 
to administer each examination. However, if one assumes that the 
four cognitive orals employed in the 1968 Final Certification Exam- 
ination are equivalent forms, then the correlations between them 
can be used as estimates of reliability. Table 33A reports the in- 
-tercorrelation of the Problem-Solving scores on the four examinations. 
"The Spearman-Brown correction formula yields a reliability 
estimate of .47 for a combination of four tests with an average re- 
liability of .18. These results are consistent with those obtained 
by employing the ANOVA formula developed by Ebell, in which each 
group of candidates who were rated by the same team of examiners is 
considered as a block. As shown in Table 33B, the reliability es- 
timates for the 4 such blocks studied in the 1968 Final Certifica- 
tion Examination ranged from .40 - .63 and the average was .53. 

The results from the two methods of estimating reliability indicate 
that tb best estimate of the combined sampling and rater reliability 
of the oral problem-solving score on that examination was approx- 
imately .50. 
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TABLE 3 2. A 



INTERRATER RE LI AT.’. XL TTY IN SCORING 
OVERALL COMPETENCE ON DIAGNOSTIC INTERVIEWS 



Examination 




N 


Correlation of 
Moan Scores Team Scores 


Reliability * 


1966 Final Cert . 


Exam . 


383 


6.7 


.58 


.73 


1967 Final Cert. 


Exam . 


387 


6.8 


.63 


.78 


1966 In - Tr a in i n g 




33 


Not Computed 


.'63 


.78 



* Computed by Spearman- Brown formula for pooled scores of both examiners. 



TABLE 32B 



RELIABILITY OF THE OVERALL COMPETENCE SCORE ON THE 
DIAGNOSTIC INTERVIEW ACROSS CASES AND EXAMINERS 



i 

. Examination 


N 


Correlations with Another 
Examiner Using a New Case 


Reliability * 


1966 In-Training 


25 


.14 


.25 



* 



Computed by Spearman- Brown formula for pooled scores of both examiners. 



TABLE 33 A 



INTERCORRELATIONS OF PROBLEM ‘SOLVING. SCOPES ON THE 
THREE PROBLEM SOLVING ORALS AND THE OBSERVATION 
AND INTERPRETATION ORAL, 1968 FINAL CERTIFICATION EXAMINATION 



N=391 



T 1,1 1 " ,,T " ’ ‘ ’ - L ‘ " 1 T _J " 1 1 r ■’ jr - l ' ’ JL ' 1 ’ n " T .j 










Observation and 






Adult 


Child 


Trauma 


Int er pr e t a t i on 




Adult 


— ... — 


.17 


.29 


.13 




Child 


.17 


- M 


.18 


.18 




Trauma 

Observation and 


.29 


.18 


— — — 


.12 




| Interpretation 

i- _ 


.13 


.18 


.12 


— •“ — 




_ _ — ~ .. - . — y— * 
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TABLE 33B 



ESTIMATES OF RELIABILITY OF ORAL PROBLEM SOLVING SCORES 
1968 FINAL CERTIFYING EXAMINATION 



Block 


N 


Reliability of 
One Score in Block 


Reliability of 
Pooled Scores in Block 


A 


27 


.29 


* .63 


B 


27 


.14 


.40 


C 


27 


.23 


.55 


D 


27 


.22 


.53 


Mean 




' .22 

* 


.53 

- - - . J 



i 



While pooled data from 4 oral examinations may be extremely useful 
in generalizing about groups, they are not: sufficiently reliable to use 
alone to certify individuals; to be of value for this purpose, they 
must be used in combination with other test data. As shown in the next 
section, the data on the validity of the oral examinations support this 
view. 



y^jiA dit y o f the Oral Examinations of Complex Cognitive S ~k ill 
Content Validity 



s 



The process analysis of the conventional orals revealed that they 
measured predominantly the recall of factual information. The new 
orals were deliberately designed to elicit more complex cognitive be- 
havior from candidates. However, there was considerable question as 
to whether the examiners, accustomed as they were to administering 
oral quizzes, would be able to adjust i:o the new examination techniques. 
Consequently, a systematic obser /at i onal analysis of a random sample 
of the over 3500 oral examinations administered in January 1968 was 
made by a trained team composed of 12 physicians and educators.* The 
results (see Table 34) indicate a significant shift in the behavior 
of both examiners and candidates: in the traditional orals, candid- 

ates spent most of the time replying to specific questions posed by the 
examiner; in contrast, in the new orals they spent most of the time 
questioning the examiner to obtain data for interpretation in arriving 
at conclusions which they then explained to the examiner. Secondly, 

* See Appendix 28 for the complete report of the observational 
analysis . 

o 



TABLE 34: NATURE OF EXAMINER-CANDIDATE BEHAVIOR 
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examiner-candidate behavior differed in the 
the* three major types of now or ale (problem- 
and role- p] ayi ng ) , 



expect e d 
a ol ving , 



direct j on a among 
interpret at j on 



A a a further check on the content validity of the examinations, 
examiner ‘j and candidates were asked to complete a questionnaire in- 
dicating their acceptance of the new techniques,* Table 35 reports 
examiners' responses to the statement: "The portion of the examina- 

tion I administered provided me with valuable information about the 
candidate's ability in some Important area of orthopaedics." 



TABLE 35 



EXAMINER APPROVAL OP 
* 1968 PINAL 



NEW ORAL EXAMINATIONS 
CERT IP I CATION 



Examination 


N 


Strongly 

Agree 


Agree 


Un- 

decided 


Dis- 

agree 


Strongly 

Disagree 






N 


1 


N 


% 


N 


% 


N 


% 


N 


% 


Problem-Solving : 
























Adult 


36 


t 


17 


30 


83 


0 


_ 


0 




0 




Problem-Solving : 




V 




















Child 


35 


4 


11 


29 


83 


• 

1 


3 


1 


3 


0 




Problem- Solving : 
























Trauma 


34 


9 


26 


25 


73 


0 




0 




0 




Observation and 
























Interpretation 


34 


14 


41 


18 


53 


2 


6 


0 




0 




Role-Playing 




















V 


§ 


Simulations 


50 


11 


22 


28 


56 


4 


8 


5 


10 


2 


4 


Total 


189 


94 


23 


130 


68 


7 


4 


6 


3 


2 


1 



While the results reveal some reservations about the value of the 
role-playing simulations, they also indicate an overwhelming examiner 
approval of the new cognitive orals: 



O 



See Appendices ,_18__ and 19 for the complete report of the 
questionnaire study. 
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Among the over 300 exom:incrs who administered the new problem- 
sol v.i dp, or ol s , only one felt that they did not yield valuable in forma- 
Lion about on 'important aspect of competence and only one oilier was 
undecided about: this issue. Candidate response, as shown in Table 36, 
v/as almost c-qually favorable. Of special interest: in this regard, is 
the fact that there were no significant differences between the re- 
sponses of candidates who passed the examination and those who failed it. 

TABLE 36 



PERCENT OF CANDIDATES AGREEING WITH SELECTED STATEMENTS 
ABOUT COMPONENTS OF THE 1968 CERTIFYING ’EXAMINATION 







Written 


Tests 


Oral 


Examinations 




















Observa- 


Role- 






Multiple 


Written 








tion and 


Play j ng 


1 


Response 


Choice 


Simu- 


Prob] cm- Solving Orals 


Interpre- 


Simu- 


Statement 

i 


Group 


Test 


la t ions 


Tr aurna 


Adult 


Childrens 


tation 


lations 


Gave me a 


100 who 
















•hance to 


pass ed 


43 


61 


86 


75 


82 


69 


67 


lemon strata 


















•y abilities 


100 who 














r 


1, n some im- 


failed 


42 


52 


85 


70 


75 


65 


65 


portant 


















jurcas of 


















•orthpaedic 


















surgery 


















Most topics 


100 who 
















covered 


passed 


52 


76 


91 


85 


86 


72 


78 


were im- 




• 












*’ 


portant in 


100 who 
















orthopaedic 


failed 


52 


79 


91 


84 


87 


73 


77 


practice 


















Examiner 


100 who 
















was skill- 


passed 






72 


68 


74 


• 72 


78 


ful in 


















putting me 


100 who 
















at my ease 


failed 






63 


60 


66 


64 


70 


Examiner 


100 who 
















DID NOT 


passed 






4 


11 


3 


2 


o 


give me a 


















c .ice to 


100 who 














* 


answer 


failed 






3 


3 


1 


1 


0 


questions 


















: adequately 






• 






• 
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'Jn summary , the data from both sys tenia f. :f.c* * observational analysis 
and from questionnaire studies suggest that the new orals are character-- 
izec! by h:i gli content validity with respect both to the material sampled 
and the examinee behavior elicited, 

Con s t r u c t Validity 

To dc» ten mine the construct validity of the new examinations , data 
from the pcrcopti oris of ob servers > candidates and examiners were snp~ 
plemented, where possible, by studies of examinee performance at dif- 
ferent levels of education and experience, and by factor analytic studios 
of the interrelations among various test scores. Data of the first 
type are available only for the Diagnostic ' Interview , and as summarized 
in table 37, indicate that differences between groups at different 

levels of training are statistically significant and in the expected 
direction . 



TABLE 37 



OVERALL COMPETENCE SCORE ON DIAGNOSTIC 
INTERVIEW BY LEVEL OF TRAINING 



Level of Training 



N 



Scores 

Mean* S.D. 



1st year 
2nd year 
3rd year 
4th year 

Total 

* Differences 



29 

75 

50 

79 

233 



between means 



5.4 

6.8 

6.9 

7.6 

6.9 



significant at .01 



2.2 

1.7 
2.5 
2.5 

2.8 



level by ANOVA 



Data from the two factor analytic studies bearing on the construct 
validity of the new oral examinations are summarized below.* 

* See Tables 6 and 7 and Appendices 14 and 15 for detailed presenta- 
tion of data. 
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'in l:lu* 1966 Final Cox Lifi cation Fxa.ni.inatxj.on , the Dingnosl: j c 
Interview was shown l;o have a moderate loading on a recall or content'. ( 
factor , and a heavy loading on a separate Factor, This second factor 
appeared to be related to the abili ty to v’itliliold judgment , since it 
v/as at the opposite pole from the Proficiency Score on the written 
simulations of treatment, problems included in that examination which 
placed emphasis on decisiveness in taking action. The factor analytic 
study of tlic 1966 Final Certification Examination indicated that re- 
structuring the oral examinations altered the factor structure in the 
direction of significantly increased factorial complexity. In comment- 
ing on these data, it should be noted, first that all of the orals had 
at least moderate loadings on a factor which appears to be the ability 
to respond effectively in oral situations; this factor was so-named 
because the Role-playing Simulations, which, are designed to test abil- 
ity to communicate with and relate to others, had the highest loading 
on this factor. Second, each of the four new cognitive orals had at 
least moderate loadings bn two or more factors, and one, (the Ob- 
servation and Interpretation Oral) had loadings on all five of the 
factors that emerged from the analysis. Two of the Problem-Solving 
Orals , Trauma and Adult Orthopaedics, had substantial loadings on a 
content factor; the third, Children's, did not load at all on this 
factor. The Observation and Interpretation Oral loaded moderately 
on this factor, and both it and the Problem-Solving Oral in Children's 

/\ l 1 It 4 4 4 mm * - _ 



Orthopaedics had moderate loadings on a factor which appeared to be 
related to the ability to draw reasonable inferences from ambiguous 
data; this factor was so named because the score on Selecting In- 
dicated Procedures in the Written Simulations of treatment problems 
also had a very high loading on it. 



f 



In summary, the factor analytic studies str* gly suggested that 
at least some of the new orals were multi-factor in nature. In 
recognition of this fact, all oral examiners we? r instructed to rate 
candidates on several aspects of competence and those sub-scores were 
differentially weighted in the various oral examinations, so as to 
assign greatest importance to the one component which each new type 
of exercise had been specifically designed to assess. While it was 
recognised that the intercorrelations among these ratings could be 
expected to be relatively high, it was also clear that low inter- 
correlations might be attributed either to limited reliability of 
ratings or to true independence of the factors being rated, and that 
choice between the two explanations would depend on the degree to 
which sub-scores on the various orals were correlated with sub-scores 
on other techniques designed to measure similar components of com- 
petence. The data bearing on these two hypotheses are summarised 
below and discussed separately for each type of oral exercise 
designed to measure complex cognitive behavior. 
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C)j* sub -scores on patient interviews 

EXAMINER I VERSUS EXAMINER XI 



3.966 C)0KT IF I CATION EXAMINATION 



N-383 



Scores Assigned By Bxcirainor IX 



Diagnostic Interview Proposed Treatment Total 



i r 



i r 



i 




\ 

\ 



: 



H 



0) 

n 

•rt 

/ 

if 

R 

W 



>1 

m 

r d 

<u 

d 

tr> 

•H 

in 

in 

< 



in 

o 

M 

O 

o 

w 



1. Information 








v Gathering 


.53 


.37 


.45 


2 „ C ommnn I c a t i on 


.39 


.36 


.39 


3. Efficiency 


.53 


.42 


.53 


4. Diagnosis 


.41 


.37 


.43 


5. Overall 


.52 


.42 


.50 


Proposed Treatment 


Interview 




1. Statements 


.41 


.38 


,43 


2 c Manner 


. .45 


.42 


.46 


3 . Interaction 


.41 


.40 


.42 


4. Overall 


.45 


.42 


.46 


Total 


.55 


.48 


.55 



.48 


.52 


.39 


.38 


.42 


. 41 


. 52 


.37 


.40 


.39 


.40 


.40 


.41 


.46 


.52 


.55 


.41 


.39 


.42 


.43 


.56 


.55 


.52 


.35 


.37 


.37 


.41 


.50 


.56 


.58 


.42 


.43 


.45 


.46 


.57 



.43 


.45 


.59 


.60 


.62 


.64 


.60 


.46 


.50 


. 63 


.65 


.65 


.67 


.65 


.44 


.47 


.64 


. 66 


. 66 


.69 


.63 


.45 


.50 


.67 


.69 


.69 


.72 


.66 


.57 


.60 


.60 


.61 


.62 


.65 


.69 
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>ub - score.*; cm Lbc? 



] ).i. agno £.; t j c I n V e r •- 



The factor structure* of tin* 

( viev/ was a 1. 1 1 cl Led in the 3 966 J. inai. Cor tjJ \ cat j on Exam! nat.i on in which 
tlic I) j. a gn ( > >j l x c. ] nl.cn view (i'.!. j was tic.iiirj.iii stored os n role— pi ny ing cm~ 
erciso, together with the* Proposed Tr cm tine nt Interview (P'J’I). Two ex- 
aminers independently snored each interview with a specially developed 
form, on which 3 to 4 sub-scores and an overall score were recorded 
separately, for each part of the interview.* The Tntercorrel ntlons 
v/ithin and between the two raters’ scores, shown in Table 38, reveal 
reasonably high inter-rater agreement on the Overall Score for the 
Proposed Treatment Interview and for the total score on the combined 
Diagnostic and i.i oat went Interviews , but substantially less agreement 
on the several part scores. Indeed, it is erf some concern that the 
correlations between ratings assigned by the two examiners on differ- 
ent sub-scores are often as high between ratings assigned on the same 
sub- score. When the ratings of the two examiners are pooled, thus in- 
creasing their reliabilities, the in t o r c o r r e 1. a t i. on s between sub-scores 
usually exceed .70 and are, in some cases , as high as .90. (see Table 39) 
These data reveal a strong halo effect in the assignment of sub- scores, 
with the Communication Sub- Score on the Diagnostic Interview being the 
most independent. Subsequent correlations and factor analytic studies 
confirmed this impression and suggested that, in view of its relation 
to scores on the traditional orals and on certain components of the writ- 
ten examination (see Table 40), this sub-score was, in part at least, a 
measure of affective behavior. 



TABLE 40 



CORKr.LAT J. ON OF COMMUNICATION AND DIAGNOSIS 
SUB-SCORES ON THE DIAGNOSTIC INTERVIEW (DI) 
WITH SELECTED VARIABLES 



i.966 ORTHOPAEDIC .CERTIFICATION EXAMINATION 

N=383 





lota 3. 


Wr .i 1 1 e n E x am 1 n a t i on’s 


Oral Examinations 


Score 


• 


Multiple Simu- 
Choice lations 


Staff 

Path- Child- Conference 
Total ology ren's Simulation 


DI Communication 
DI Diagnosis 


.30 

.20 


.28 .11 

.19 .04 


.46 .26 .15 .18 

.38 .20 .09 .09 



* ^ eG Appendix 10 for a copy of the rating form and descriptions of 
the sub-scores. 
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Analyst’,'; of those* tints and coins j clor.it j ons of practicality loci 
to the decision to administer the Diagnostic Interview and Proposed 
Treatment l.nl or view separate!./ in the 1908 oral examinations and to 
the revision of the rat. inf.; form so as to roepti.ro all oral, examiners 
to rate candidates on tin* same four aspects of competence: Recall , 

3nlojp.col.iyo Skill, Prob! cmr- Solving Ability and Attii.udorj . * Table. 41 re- 
ports the intercorrol at i ons between these sub-scores for the five 
oral examinations included in the 1968 Final Certification Examina- 
tion. Even with the revision in procedures described above, it ap- 
pears that these sub-scores are closely interrelated, that the score 
on the attitude component again manifests the greatest independence, 
and that the sub- scores on the oral Simula! ions are generally charac- 
terized by a somewhat: different pattern from that of other oral tests . 
While, theoretically, tills could he explained as being clue to dif- 
ferential reliabilities of the subtests, the correlations between 
oral examination sub- scores and other variables, including super- 
visors' ratings (see Table 42), suggest that the- attitude score is 
at least as reliable as others and is substantially less influenced 
by the cognitive skills sampled in the multiple choice and written 
simulation tests. This interpretation is supported by the factor 
analytic studies (sec Table 7) showing that the Simulation Orals do 
not load on the same factors as do other orals. These data suggest 
the advisability of collapsing the oral Recall, Problem Solving and 
Interpretive Skills scores. This modification was , in effect, ac- 
complished by the weighting system that was developed and the ground 
rules that were established for determining the "pass-fail" levels 
in the 1968 Final Certification Examination.** 



In summary, despite the "halo" effects in scoring oral examina- 
tions, there is considerable evidence that interpretive ability dif- 
fers somewhat from what is called problem solving ability, and that 
both of these differ from attitudes and communication skills. Hope- 
fully, these factors can be purified and the intercorrelations between 
them reduced by the development of problem exercises and rating forms 
that incorporate improved methods, and by the further training of 
examiners to utilize these, techniques more reliably. 



* See Appendix 13 and Table 31 for a copy of the rating form 
and a description of the rating factors. 

** See Chapter x for a description of the system employed in 
combining sub-scores and setting standards in the 1968 Exam- 
ination series. 
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of sud, scorns on tun oral tksts 

19GB CJ'lk'j’ll’yj NG KXAM:i NATil ON 



N-39] 

Problem 

In t e r pr e t - S o 1 v In g 

Test: and Pairing Pant: or ivc Skill Ability 



Problem-Solving : Acht.! t Orthopaedics 



Recall ■ .83 .77 
Interpretive Skill .86 
Problem- Solving Ability 

Problem-Solving: children's Orthopaedies 

Recall " .83 .80 
Interpretive Skill .85 
Pr ob 1 cm -So lving Ab i 1 i t y 

Pr ob 1 em - S o 1 v ing : Tr a uma 

Recall .77 .79 
Interpretive Skill .86 
Problem-Solving Ability 

Observation and Interpretation 

Recall .77 .77 
Interpretive Skill .71 
Problem-Solving Ability 

Simulation Orals 

Recall .75 .78 
Interpretive Skill .83 
Problem-Solving Ability 

'/OIAIjI All Five Orals 

Recall .88 .88 
Interpretive Skill .90 
Problem-Solving Ability 







Attitudes 
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Oo)k-u < r on l V.'i 1 :i d i l y 



Tl k ’ J > .i y ! <jj i cy ; | i i - _ X ) j I (. ■ i v/ j ( . v/ . 



‘.hj, an a 



G.ivi»)i [)t r ‘ .low rc'.l j ab.'i 3 ;i l - y of 



f:oj),iratn tori (con above), i i would not be 



tin r: 






to find that it lacks aoncur. 1 . cnl valj d;i 

( , M , ./ 



5U t pr :i.fj :i. jjc| 
y ur measured by corrcd citrons 



w. i lh ruUny.;. It la, ttarofor,., of a,x_:c.ih.T intoroat 'to 

n ° 1 ' Kl ' ■ ox ^’d.- rcl and four lb year residents the subscores on Di acjnor; j s 

was the bon l predictor of the rating factor "Informal. j on Gathering’ " 
ihis jcssu.lt suggests that instead of rating the process of ''information 
GuLIioj nig, which they ra.ro.ly »oc, chiefs oi training rated the product, 

x. o. , the accuracy of rcssidojjt diagnosis. j n this limited sense, the one 

subscore on the Diagnostic Interview may bo said to have concurrent 
val :i dity . 

^OiJ^rniita t J on n ,o f _ Coj np] ex Cog n:l t :i v e } 5 eh nv i or . Correia t ion s 

between various criterion variables and "the" Yota f scores on the 4 
aspects of competence* assessed in the new types of oral tests of 
complex cognitive abilities incorporated in the 1968 Final Certifi- 
cation Examination are reported in Table 42. The low positive cor- 
relations found between subscores on the oral tests and other assess- 
ments of competence are, in part, duo to the error variance in each. 
Despite these low reliabilities, differences in the magnitude of the 
several correl atlons (o.g., higher correlations between oral test 
score and supervisors' ratings of cognitive attributes than between 
oral tost scores and their ratings of psychomotor and affective be- ( 
havror) are in the expected direction. More detailed data bearing 
on the concurrent validity of these new orals were presented above** 
in the discussion of the reliability and validity of supervisors' 
ratings; as summarized in Table 30, these data reveal that the cog- 
nitive orals are consistently the best predictors of supervisors' 
ratings. It is of special importance, therefore, to note that among 
a 1 the oral examinations, scores on the 30-minute Observation and 
Interpretation Examination have the highest partial correlations with 
/or the 10 rating factors . This phenomenon is best understood in 
light of the factorial complexity of that test as revealed by the 
factor analytic studies. Rating factors are factorially complex 
simply because people have difficulty abstracting the reasons why 
a particular individual performs the way he does. Therefore, an 
evaluative technique of similar complexity will show relatively high 
correlation with such rating factors. 



These aspects of competence were: Recall, Interpretive Skill 

Problem- Solving Ability and Attitudes. 

See Chapter v 
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CORK! JLAT TOMS JiK'l'IVjJjlii SULSCORLS Oil ORAL KXAMINA'J'IONfi 

AMD OTJILU EVALUATIONS 



N--303 



1 C )LR C'ertri float ion if’-.ei'dnntion 



Sub scores on 
All 5 Oral 
Exaiiii nations 


Ra L ingtf 


of 


Supervisors 

Ratings 


Scores 
Wri lien 


on 

Examn net 5 one 


Problem- 
Sol ving 


Surgi cal 
Technique 


Patient 
Relnt ions 


Overall 

Competence 


Multiple 

Choice. 

Total 

Score 


Simulations 

Total 

Proficiency 


Recall 


.30 


.21 


.20 


.28 


.47 


.21 


Interpretive 


- 












Skill 


.34 


.24 


.23 


.30 


.47 


.26 ' 


Probl (Mil- 














Solving 














Ability 


.31 


.21 


.26 


.27 


.47 


.23 


Attitudes 


.28 


.19 


.20 


.25 


.36 


.11 



i i 

These results do not mean that the other orals are redundant or 
that the Observation and Interpretation format should be adopted for 
all of them. As revealed by the detailed data reported in Appendix 15 > 
scores on all. the oral examinations make a positive contribution to the 
prediction of competence. The way in which each of the orals con- 
tributes to this prediction of criterion variables, and the effects of 
unreliability on such predictions are shown by Table 43. 

These data suggest that the cognitive oral examinations could 
reasonably be considered parallel versions of the same test arid that 
the low correlations between criterion variables and subscores on 
individual tests are duo, in part, to the unreliability of the 
latter. This interpretation is supported by the fact that when 
similar scores are combined across tests, the size of the correlation 
increases. The additive nature of the test data is indicated by 
the close similarity in the magnitude of the multiple corrections 
between rating variables and test scores and the simple correlations 
between rating factors and weighted total test scores (see Table 43.) 

The degree of agreement in these two sets of correlations suggests 
that the weighting of the tests as determined by logical criteria 
jwas quite close to the empirical weighting yielded by the multiple 
regression equation. 
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111 su.m.inj y , 1 1 VC' cl/ll u indicate* that each o f the- tests included 
in t lie 1902 ]’j iia.1 Carl J l i cation Examination males an i nek-pendent, 
contribution in the predj c ti on of compel once as that :i s defined by 
fjii pci vm sors 1 ratings, and that the factor compositi on of the tests 
is clor.e to the factor we i ght ing that supervisors use in ovalealin; 
tbei r res j dents . 



Finally, it: is clear that, lengthening each test from which an 
independent suhscorc- wa s derived, increased t.hc concur rent validity 
of the orals, as estimated from tlx- correlation between test scores 
and criterion variables. Using the Cull foxxi tcehivi quo to correct 
the correlation for attenuation due to unreliability, if. is found 
that with an estimated reliability of .50 for the three problem- 
solving orals (probably an overestimate) the corrected correlation 
between the rating factor, ’’Problem Solving, M and the Oral Fr obi cm 
Solving Score is .76. The square of this figure, .68, is the estim- 
ate of the proportion of- true common variance between the 1-1 /2 hour 
Problem Solving Orals and supervisors’ rating of problem solving 
ability. A substantial amount of tlx.- remaining 32% of the. variance 
is probably associated with the multiple choice test, the written 
simulations, the Observation and Interpretation Oral and the Simu- 
lation Oral. These results suggest that whatever supervisors mean 
when they rate problem-solving ability, is closely related to the 
Problem-Solving score on the three Problem-Solving Orals, since the 
tests reflect the same factors as observers' ratings of habitual 
performance . 



In summary, studies of the oral examinations of complex 
cognitive behavior incorporated in the 1.968 Final Certification 
Examination indicate that as presently constituted, these exam- 
inations predict: about as much of the variance in ratings of habit- 
ual performance as can be expected, in. view of the inherent un- 
reliability of both ratings and oral, examinations. Further im- 
provements in these oral examinations will, consist in increasing 
their reliability by better selection and orientation of examiners, 
by Increasing the number of cases and/or the number of examiners, 
and by utilizing statistical methods to adjust for error. The 
results from the Orthopaedic Training Study suggest that the methods 
of pooling data, of structuring the examinations, of standardizing 
case materials and of training orals examiners described above can 
be employed to minimize many of the validity and reliability prob- 
lems that plague traditional oral examinations and can thereby in- 
crease substantially, the arsenal of techniques available for the 
assessment of clinical competence. 
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Or/il 'S'l-r.i y, of Al t ! i.iH,- 



i )•'! ]\o 1 c> ] 1 



ig Or. VI 



J: 3 *'iL l i. r f J .<>1 Mi" Kxam.i ;:ai i 



O 1 



. J K ' t: ! J ! 3,K! ideul f'l i.ily li.-id j iicurl r i <d t hf. ab:i 1 i f y to eom- 

inuiucaU- WJ| " r( ’ hllt ' <-o indbodr. and colleagues 0B among t ] K . 

Jl . lra1 'OOI.U: of eciMpelenee. At: noted above, two role' playin'- 

' t 1 ^ lb agnostic Interview mid the Proposed Treat 

incut Interview, were the first: examination.-; specif i cully developed 
to assess, these aspects of performance. Subsequent to their expert- 
mental i m.rodueti on on the 1 906 oral examinations the Simulat ed 

-Sr UC J -” , ; c ' rvl ‘* v ' ««« converted from a role-playing to problcn- 
f C'l.r'i cj se and was incorporated j,, the Problem- Solving Orals 

asTro?e. nw was mainlined 

jj,| 1 ‘'A'"- 1 mK administored together with two ad- 

i.. u ci. - tyjH-s of s ujiiiiH /. it:c?c1 confront: a i ions in a separate half-hour 
Oinmnint: nn. T tr.cn .orLKt-n,,.^ ,..t . .. . . . . \ noiu 



^ 



s ssztt rr 1=^! 

o nocUfy her h.mdlxnft of n patient, explaining the failure of a 

di® sprees a’nd * -Heague with whom one 

j...a 0 3 (.c ft , and the like. In each such exercise, the candidate was 

Feliks that 1 ,h f inUt ? S riT- d " briof Ascription of the situation, 
• d. that shown in Exhibit V . opposite; he was then given 



~ ~ m.: W ci a LUeil RlVCIl 

appi oxxmatoly 7 minutes to talk with the simulated patient or col- 
league, whose role was taken by an examiner. The candidate's per- 

ratln'f'f orn. S f.'fif An ln Apcndently by two examiners on a standardized 
rating fortnjsce Appendix _10 ). Data on the reliability and valid- 

I VO r.rl U ^ 1 _ . _ 



• r 1 / * »■ y « uu u J j.vj 

ity of the Simulation Orals arc reported below. 



. ?£^lni_u 1 a 1: 1 o n Or a 1 s 



are two 0X0 1 exam ^ nations of complex cognitive skills, there ' 

of attitudes- i f° UrCO f of “ nroliabi lity in the scoring of oral tests 
‘ I 1 ! e mter-ratcr disagreements and sampling errors. Data 

tlni'V 0 ™' 1 ' summarized from several studies in Table 44, indicate 

re „ f VCly hl8h level of a 8 rcen >ent in the independent 

1 W rt, 0 Lwo examiners observing the same examination and 

Xi i n *" cr ~ r ster reliability of the examination was not sipnifi- 

surp isin^findif ' “ C ° ld te « oaso in l««8«= h . This soimewhat 

Studies ili .n ? ? y c accoi ; nt:ccl foir b y tb <= fact that in the 1966-67 
together • t “'"in” Proposed Treatment Interview was administered 
8 . with a 20-mmute Simulated Diagnostic Interview, and examin- 

'their a r-itin^^ boon strongly influenced by performance on the latter in 

nl - n ?c 8 ° f h ? for,ncr > thus , in effect, basing their judgment 
on a half-hour sample of candidate behavior. J b 
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k>;iji.b:it v 



Candidate In ft l ruct inns for Sampl e 
Proposed Treatment Intervi ov? 



AfJ the orthoped ist-on-cal 1 for t :1 it* day, you arc called l:o the 
emergency room of your hospital to ret; a patient: whom you have never 
met before . The patient, is a 28-year old male laborer who has a two 
clay history of severe low hack pain which radiates down the left: post- 
erior thigh Into the calf; there la also some tj.nftli.np, and numbness 
over the left lateral calf. The pain hep, an when be injured his back 
while lifting a heavy weight from the floor; ho has had no previous 
episode of this type. The pain is aggravated by coughing. 



. On examination there is marked paravertebral muscle spasm; the 
patient is listing to the right. There is marked tenderness at the 
L/ 1 -L 5 interspace. Back motion is restricted in all planes. Straight 
leg raising on the left produces pain in the left leg at 30° elevation; 
straight leg raising on the right produces pain in the left log at 60°. 
There are no reflex changes , no motor weaknesses , ana no sensory change 
Otherwise, the physical examination is within normal limits. 



You have reviewed this patient’s X-rays, one of which is enclosed. 

You suggest to the patient that he enter the hospital for a period 
of complete rest and "conservative care." Your suggestion of hospital^ 
izatlon immediately alarms the patient, who then asks, "What is wrong 
with my back, Doctor?" 

* 

Yon will now describe to a simulated patient what is wrong with 
"his" back and explain why you proposed your course of treatment. The- 
laborer's name is Mark Cole. 






* *** ** * ---W* » 
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TABLE /|/| 

lii-u .lAuniiTy OF oral simulations eased on skverai, studje 



:s 



TABLE 45 



MEAN SCORES OF RESIDENTS AT DIFFERENT LEVELS OF 
TRAINING ON THE PROPOSED TREATMENT INTERVIEW 



1966 Oral In-Training Examinination 



o 

rERIC 



Technique 


Length of 
Exerei se 


N 

. 


Examination 


Mean 


Correlation 
Between 
Raters Observ- 
ing Same 
Examination 


Overall 

Reliability 


Proposed 

Treatment: 

Interview 


7-10 min. 


1 

383 


1966 Orthopaedic 
C e r t i f 1 c a t ion 
Examination 


.7.1 


.72 


.84 


Proposed 

Treatment 

Interview 


7-10 min. 


30 


1966 Orthopaedic 

Certification 

Examination 


6.9 


.55 


.71 


Proposed 

Treatment 

Interview 


7-, 10 min. 


387 


1967 Orthopaedic 

Certification 

Examination 


7.0 


.61 


.76 


Simulated 
Interview ; 


28-30 min. 
_ 


391 


1968 Orthopaedic 

Certification 

Examination 


,7.7 


.73 


f 

.84 

, t . . .. 



Level 


N 


Mean* 


S. D. 




1st Year 
2nd Year 
3rd Year 
4th Year 


29 

75 

50 

75 


6.2 

6.9 

6.5 

7.5 




2.8 

2.8 

2.8 

p a 




Total 


233 


6.9 




2 . 6 




* -ANOVA indicates a P. Value 4 .08. 
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lied study of the co:i 



xined effect :v> of rater iind 



two cl! f- 



Xn the on! 

firr-siplli ) •', error cm (.lie rel j obi 1 ;i ty cj f! t iro simul ation oral; 
for an l lYoposed Treat merit: Interviews were separately adinini.st ered 
by two examiner a to a group of 25 residents in the 1966 Xu-Training 
Examination. The correlation between t.be scores on the t wo oxamin- 
at ions was J\ 9; combining t.he two sc. ores yields an estimated relia- 
bility of .72. This is an astonishingly high value for a 7-10 minute 
test and is of special interest in view of the lov? correlation across 
cases in the problem solving orals and the written simulation exer- 
cise’s. The higher sampling reliability of the Simulation Orals is 
probably attributable* to the fact that they presuppose much less 
specific content than do the Problem Solving Orals and written sim- 
ulations . 



Unfortunately, it was impossible; for mechanical reasons, to 
study either the rater or sampling error of the Simulation Orals in- 
cluded in the 1968 Final* Certification Examination. However, the 
two studies reported above strongly suggest that those orals consist- 
ing as they did of a half-hour session devoted to 3 or more stand- 
ardised exercises administered and rated by two examiners reached an 
acceptable level of reliability. 



Val idity of Simulatio n O rals 

Content validity of the simulation orals was assessed by both 
observational analysis and systematic questionnaire studies. The 
former, discussed above,* indicates that the behavior required of 
examinees in the Simulation Orals was quite unlike that demanded of 
them in either the traditional orals or the now Problem Solving Orals 
Furthermore, the observed differences in the nature of examiner- 
candidate exchanges in the 3 types of oral formats were in the hypo- 
thesized direction. The results from the questionnaire study** were 
similarly encouraging. Specifically, some 00% of the examiners were 
convinced that the simulations provided valuable information concern- 
ing the candidate’s ability. Approximately two- thirds of the candi- 
dates agreed that the examination gave them a chance to demonstrate 
their ability in some important area of orthopaedic surgery, and 
almost three— fourths felt that most of the topics covered were im- 
portant to orthopaedic practice; only 12% reported that the examina- 
tion procedures were confusing, despite the fact that role-playing 
was new to most. In short, the role-playing simulations met accep- 
table standards of content validity as samples of specified attitudes 
and skills in patient cind colleague relations. 



* Sec Table 34 and Appendix 28_. for additional data on the 

observational study. 

* w See Table:; 35 and 36 , and Appendices 18 and 19 for detailed 

tabulation of examiner and candidate responses in the questionnaire 
study. 






Cenp fp'pcj Y-i.l |d iiX of t' ] n 1 Simula! i on Oral s 

two sepai a t c studio:; , In the f:i ml., conduc. ted : 



was 



n 



invest j ['ate d :i.n 
comic f:t::i on w.i t: 1 1 



the 1966 ’/n- Trainin'' Kxamlmitj on , the relationship between level, of 
training arid nearer, on the Proponed Tree tine fit interview wan analysed. 



The; ret nit a, summer -j y.cc] in Table 



45 



indicate alight, though not 



a-j gn 



j cant , improvement in porf ormance , Differences 



stat ist :i cal ly 

in scores between : « oupo at different levels of training were sub- 
stantial! y gx eat or on both the Diagnostic Interview and the conven- 
tional orals. While this finding may appear t.o raise; doubt about- the 
construct, validity of the Simulation Orals, it should be considered 
in light: of the* fact: that, most: supervisors of orthopaedic training 
pi ograms report, that they almost never observe their residents deal 
ing with patients and that little attention is 
to relate to, and communicate with, patients, 
qu c; s t i o] ) n a i.r e s t u d y r e f c ;r r e d to a b ov e , * s om c 
admini. storing Simulation Orals agreed with tiro 

training, programs apparently do not adequately _ 

to take this type; of examination." In contrast, only 27% of those 
admini s i or ing the Observation and Interpretation Orals, and fewer 
than 20% of those administering the Problem-Solving ' Oral a , agreed 
with the statement quoted, in short, the findings regarding the 
relationship between level of training and performance on the 
Simulation Orals are compatible with supervisors ' expectations as 
based on the nature of the training programs. 



paid to their ability 
For example , in the 
40% of the examiners 
statement: "Most 

train the candidates 



The second study of the construct validity of the Simulation 
Orals, conducted in connection with the 1968 Final Certification 
Examination, consisted in a correlational analysis of the inter- 
relations among sub-scores on that oral, and between it and super- 
visors’ ratings of habitual performance. The results, summarized in 
Tablcs46 and 47 , reveal that the intercorrelations of diff erent types 
of sub-scores on the Simulation Orals are substantially higher’ than 
those between similar sub-scores on different types of examinations, 
and higher than those between the various sub-scores and the relevant 
rating factors. In short, the data indicate that the cognitive and 
atfcitudinal sub-scores on the Simulation Orals are not independent. 



* See Tables 35 and 36 and Appendices 18 and _X9 for 
detailed tabulation of examiner and candidate responses in 
the qu e s t i. onn a ir e s t udy . 
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r * ;K,( " 1 ' 1 ' that in iicVij l .J.ojj to the gt<no,.al iaetox of 

"handling o:j kkH j n c •ncouni ors v/.i l .h others" (a factor which must 
fiat-uJ-Vit... noi. only nil the? 01 .<1 c-xami nat j .on.*:, but. the rating data 
an wo 1.1 ) , the ,‘l.imu I a tj on Or a. la; mmisim? some tispoel. of competence: 
dilJ from that. assayed by 1:1 k* problcirr-tfolvj ng Oral . a. This in- 

tcrjM'f l at ;i on is bused on the Ijyj .otbosj s 11, at. if the two typos of 
examinations vkhu measuring the a amo typer, of cojujk* tenco, .<jco.com on 
tlic? ,simul a l.;i on Ox air. would b* more highly eorro] a ted with other 
tor;! moo ecu; duo .simply to 1:1 jo j r higher re- 3 5 ubil j ty . Such .i.s not 
tlio. ease; Moore f ; on the? Sj.mu.laU5 on Oral m show a j yn:i f j canil.y lower 
eorrol at: j onr; with r.cuxur, on mul ti pie choice and written simulation 
exoi c.i mom than do scores on the other oral examinations. While this 
finding .probably reflects the? fact that: less specific content, in re- 
quired. in responding to the Simulation Oral than to other orals; 
these data do not, in tliemsel ves, indicate whether the other com- 
ponents of competence measured in the Simulation Oral, are, in fact, 
shill and ability in relating to, and communicating, v;5th patients 
and colleagues . In short, the data arc compatible with the assump- 
tion of reasonable construct validity of the Simulation Orals, but 
are by no. means conclusive in csd ibl.ish.inq it. 



Pf?.r i-Ht .1 hfd •_ _Vf L-l SLit x. °i- the Simulation Orals wa s investigated 
in two studies of the re! citionsh.ip between scores on that, examina- 
tlon and supervisors' ratings of various aspects of trainees' 
habitual poj j ormance . Though this method was employed in the estim- 
ation of the concurrent validity of all assessment techniques develop- 
ed in the study, its application in the evaluation of the Simulation 
Orals presented certain special difficulties that should be noted. 
Specifically, in addition to the inherent unreliability of both the 
rating data and the scores on oral tests, the rating data, in this 
case, lacked validity for the following reasons: 



Supervisors rarely observe residents with patients; hence to 
the extent that rating data are distorted by "halo" effects, ratings 
will be strongly influenced by what supervisors regard as problem- 
solving ability rather than by a general factor of ability to relate 
to patients and colleagues. Similarly, because of limited insight 
and experience in analyzing the dynamics of interpersonal relation- 
ships, some examiners have difficulty rating these factors in the 
pral tests and thus, given the "halo" effect, tend to identify com- 
municative ability with problem solving ability. Finally, because 
of the ai tif iciality of the situation and the constraints placed on 
it by the presence of examiners, examinees may feel that they have 
little? opportunity to demonstrate their ability to relate to, and 
communicate with, patients and colleagues. It is probable that all 
of these factors play a part in depressing the correlation between 
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Mv-l, o 5 i ,J„ lo ho ii,<a„do,i r, a rejoin, ,„,rl of tl.o oort :i f ;i oaU on 

c-Xfiiiii nui :i on, ih.vjur orjo.it should !>,.• dr-vo U-c'l lo jmj>i ovemont j n n, ( . 
fiooring nyr. tom um.i! and lo Urn clovel ojm.mil of i oclm.iqucm that. would 
y.i(;.!d jifl l or crJorjoii data for af : f oc-l .i vo varj abler;. 



Oral. Test?; of Aii i iudoo and Ski 1 lo- 
The S j lnii 1 a { c * cl Staff Con for once 
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One additional oral • technique, i noorporatod experimentally ini the 
.9t)0 and 190/ kina! Cortii :i yo l.ion Kxamin ution consisted in a simulhlod 
< staff conference in which live candidate:, discussed one or two eases. 
Candid ales wore given five mi miles lo study the protocol of the cases* 
and 22 minutes to discuss them : * One or more examiners observed and 
rated the entire proceedings, without commenting on them or parti cip- 
at.ing directly in any way . Two different rating forms were used for 
scoring those conferences: I.n the first, adapted from the work of 

Lass , the candidate was rated on four factors: Individual achieve - 

ment, Ability to assist the group to reach its goals, Effective con- 
duct as member of a group, and Overall competence.** In the second, 
scoring form’, subsequently adapted from Bales 4 technique, the ex- 
aminer was directed to classify and tally each statement by each 
participant on the following scale: 



f 



0 

1 

2 

3 

4 



Error (in content) 

Hinders group 

Is passive non-facilitat ive 

Clarifies and provides constructive suggestions 
Organizes, integrates, greatly facilitates. 



On the basis of the quality and quantity of the contributions 
tallied for each candidate, the examiner was expected to rate his 
overall competence and to record this judgment on the familiar 12- 
point . scale .*** Studies of the reliability and validity of this 

tec bn i q u e arc s urnmn r j z e d b e 1 ow . 



* See Appendix _2 0_ 
** See Appendix 12 
: ** See Appendix 11 



for a typical case protocol 
for a copy of this form, 
for a copy of this form. 
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S.imnl at eh Si c* iff (iOii f 



C‘j OK'r 



No filiuly of inf orca.se rol.i abj 3 j l y wa s conduct od ; in 1 1 1 c* one* study 
of internal or agrocim nt the correlation between the .scores ass jgued by 
two c:-:fini li e-j s observing the same couferc mv was found to be- .14 for t lie* 
383 candidates rated in the 1 960 Final Cert.i f :i cal i on Exam j nati on ; lb:is 
would yield an cast ium ted reliability of .2!) for the pooled scores of the? 
two oxami ners . The results for each of the* three examining teams 
involved in that examination arc summarized in Table; 48. Those? 
data indicate’ that though the level of agreement was not high for 
any team, one sot of examiners was apparent ly in complete disagree- 
ment on standards. 



For this reason, modifications were mack? in the* scoring tech- 
nique? as described above and the examination was repeated in the 
1967 Final Certi f icati on' Examination . However, the; Board was able 
to assign only one examiner to administer and score each oral; it 
was, therefore, impossible to carry out any reliability studies on 
the new scoring methods. 



Validit y _ o f t he Simulated Staff Con ference 

j/ol id fty_ of this oral technique was assessed on 
the basis of reports from candidates and examiners , many of whom 
agreed that the simulated discussions preserved much. of the "feel.* 1 
of staff conferences commonly held during residency. Some have 
criticized the technique as being too artificial and predicted that 
no candidate, however rude and tactless in a "real situation," 
would display these characteristics in the simulated situation, a 
fear that has not been justified. Finally, some have criticized 
the technique as sampling situations which, though common during 
residency training, are not characteristic of practice. 

Construct V alidity of the Simulated Staff Conference was 
studied in the 1966 Final Certification Examination. The data, 
summarized in Table 49, reveal, that, despite low interrater agree- 
ment in scoring those exercises, the pattern of correlation between 
scores on that test and scores on other types of tests does not 
differ significantly from that characteristic of other simulation 
techniques (the Diagnostic Interview and Proposed Treatment Inter- 
view). Similarly, the factor analytic studies of the 1966 Final 
Certification Examination suggest that the factor structure of the 
Simulated Staff Conference does not differ significantly from that 
of the more conventional orals. 
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2 
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TOTAL 



N « 383 



ORAL 

TLST 



rr...: 


******'* W #,*.*..** -mw * #»t . 


/.~*r .■ >i‘s . :/* .* * "*r - . . v r. . *■ rv ^ ** 


N 


MC'« l J 1 


Corre.l ati on, 1 ; 


Score 


J3o tween Scorer 






Two Nxatiii nor fi 


144 


8.2 


* 4 8 


.1 23 


7.6 


.30 


116 


8.1 


-.49 


383 


8.0 

— — — - — . _ r 


.14 


1 

TABLE 4 9 
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COURKLATJON 0)*' SCORNS ON SIMULATION 

oj^a.1,8 with fun hot no o'j'jjior trchniqurs 

I960 final CorlJ.fi eat ..ion Examination 

TraditJontil Ora] Kxam.i nation:; 



r 



Si mu- 
lti i. ad 
Staff 
Conf - 

•A 

oronco 

Diag- 

nostic 

Inter- 

view 



Prop- 
osed 
roat- 
ment: 
Inter- 
view 



GPvAND 

TOTAL 

.23 

.23 



MUj .TIPLE 
Cl JO ICR 

.23 
.23 



.25 



.23 



. 25 






snort 

ANSV7KR 

MT>^ JC4W**" 

.22 

.22 



TOTATj PATH- CNILOREP ' s 
ORAL;} OLOGY ORTHOPAEDIC 






.30 

.30 



.24 

.24 



.25 






. 18 

.18 



.47 



.23 



.27 



.22 



.57 



.28 



ANATOMY 

TRAUMA 



.13 

.13 



aduls written 

’.SIMULATION i 



.16 



.17 



.19 



.19 



.20 

.20 



.28 



.33 



m A> mmm m w». w ny »» j 



♦*#**<». 



.06 

.06 






.06 



.09 



( ) 



* Not included in Grand Total and Total Oral Scores 
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.‘.cored , mul if;; 



1,. 1.1. / t ^ - C4(U i \ el.UJ Hi , (II 

SV'Tr V ! ’;";! y „ (f V not to Limit II,, 

‘ . l.-.-fi Co’i.crcm-e nn n regul nr pari of tlw Perl i fi cat i 

I>, . y J„ ( 1 , 1 ; MBCWHHI, they Ml- re support ed by the following 
i cf.03 vul j our. i. bared by numerous c>:am:i ners ; 

,, , p-) S<w wort- not co;ivim-.-d of 1.1), iiMi.oi-tnr.oo of the behavior 

tiic* .s j iiiiil .it *j oil was dosj ''lied t.o fljist.'fjf; , 

. 1 -), S T' fout. making, « decision about a candi- 

1,1 on . l - 1,c? bn, ‘ ;jK c,;f llJ ‘ s J or 0 minute participation in n group 

da scu .‘if; :i on; and many felt that the const j tu lion of the group nf*- 
fcclcd c-acli pcrsori » performance and thus feared that an individual 

nuahf look bad because of the group he was in and not because of any 
real weakness on his part:. y 

In fiber. the Simulated Ikitienl: Management. Conference proved 
to he an int. -sting technique that: had to be abandoned due to in- 
adequate rater reliability and to lack of acceptance from the pro- 
Cession; lu.-vur the! ess, work clone by hags 3 and others indicates 
that, the technique may have great usefulness which has not, an yet- 
been exploited by the orthopaedic profession. * 

Summar y 

The: assessment o£ each now typo of oral examination recommended 
lot incorporation jn the certification examinations, included con- 
siderations of both its reliability and its validity. With respect 
to the former, every effort was made to estimate error variance due 

H. ct 0 ? n fT rnU ; ) ; dl8 ?8rccments and to sampling errors. Except: f ot- 
ic Simulated staff conference, all types of exercises met adequate 
standards of mterrater reliability. Greater difficulties arose in 
regard to lntcr-easo reliability, particularly in the Problem-Solving 
Otals ill which command of specific content was more critical than in 
to Simulation Orals. With respect to validity, every effort was 
made to investigate the content., construct and concurrent validity 
of the new techniques. Observational and systematic questionnaire 
Studies indicated that, with the possible exception of the Simulated 
Staif .Conference, all techniques met adequate criteria of content 

id i t y 3 s u r * , e s tof °t h^r C ' ^ ^ *** analytic atudioa of construct val- 

* y t ‘ 1 b l.huLj m go u cu o 1. , scoros on t.ho novs tcchnicuios v7orc 

associated with each other and with scores on more conventional 

exanu nations , in the hypothesized manner. Factor analytic and mul - 

'lplc regression studies indicated that, in general, scores on the 
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ll' ’J t ecu,! i ij-u \.vj (• n: : t >: 
1 J v J »*' » i 1 !< i y.i hi d i j e'd t < i* j ; ; * 
(Tj or in 111 ' lm nsun : c , 1 1 
prod j dee] assoeiat io-i v/a ; 

On tin- has i s of the 



' ■’ ’ 1 ( d \-i t u c s' i l e i' .1 Oi i i* ;•] ) J ( ;; ;i i i t ] k> 

liov.Vvor, du- to both in;- and lat or 

t,r Ix’H) t ool and cri t en j on var iabl os , the 
in Several (vi; ; C'fi , weaker than anticipated 



' data, the /viioi'inaii board of. Orthopaedic 

hui p/o} liiio } ('Coii.'ii i l vil c*d its oral exam i nal. i our. to include 3 J /'A 
houri; of pn.Mc aolvin- eye-; c:i sen with can], lia'J i-hour administered 
by a cl I f f t * i . t • i 1 1 (•>,;* m i n ( 'V > a bal f' lii'ur of observation and intorprota- 
fi Oil exerci f:c;; admi ni si e-rod by a fourth examiner > and a half -hour 
of simulated patient and col .1 cay/no confj onf ati on a admi ni stored by 
a team of /. exam i noj s , St andardj xed care materj ala and a tandardi xod 
scorin'' form;; arc used for all exerci sea . The same four factors , 
(recall j intci pjet ,ivc ski .1 } probl eui-sol vinj' ability and attitudes 
and c om.uuii i.eat. j on akil. 1s) ? arc; rated in c*ach sot of exorcises j 
di.f for onf j al 1 y weighted scores**, on each factor are pooled across all 
e^aminnt .1 on;. , ibe.se .innovations in the nature of the oral oxaniin*” 
at ions, and the iiiethod of scoring them arc; direct outcome’s of the 
study to date. 



i 



1 



1. 



2 . 



bbo] , Robert, "Estimation of the Reliability of Ratings" in 

~ 1 Measurements , ed . by 
Mohrens & Bhol (Rand McNally Co., 1967) 

.''TT ^ | c „, » 

McGraw-Hill Boo.-v Co. , Inc. New York 1956 ' 



3 ’ I'll' , B< ' rn ;*r d ' I,pnc1o,; ' !ef ' !! Group Discussion, ” 1’sycliolocri cal 

Bu i l.c-t . 3 n, 1954 Vol, 5] po . 465-491. 

4. liales, R. Freed, j nte rac 1.1 o)j_ProccySvS_ Analysi s , Camln'i dge • 

Addison - Wesley, 3 951 ’* 
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CiiA' jJji Vili: 

Tin*, i j j; cho.ick (jiji-:aT]'o;;.s 

'flic Multiple Cbo'cc i fclni.i cjih' was d i sews .sod In sr>n ir dr-rail above 
in conn.t'-l j on with Lbe analysis of prev i on.*: evaluation leebni quo?: .* 

While many of l.he multiple choice questions lined in 1964 wer** at ill. 
rel e'aa.it i n 19uo, numerous mod j .f i eal i oris oe. em'rcd in the ‘j i it or i m , V7.il. li 
respect to procedures f c >*•* developing, rovi owing and scoring such ex- 
ercises. 7>s noted above, the l wo serl ou.a woahnosscs in th r. multiple 
choice- tc-ehni quo revealed by the earlier analysis vc-ro: (] ) the 'tendency 

in those exorcises to foe u £5 on iu- -asuri ng iho recall, of isolated hit a of 
informal' .i on , that often appeared to have little relation to any behav- 
ior required jn the practice of ortbopaodi cs j and (2) the tendency to 
sot the passing mark on l 1 k» basin of the d j attribution of scores, rather 
than on the basis of pro-determined standards, a tendency which, by 
punishing an arbitrary number of examinees for scoring at tbc bottom 
of the distribution, violates tbc mission of any Board established to 
determine whether an individual meets professional standards of corn- 
pot cnee . 

To alleviate some of these problems and to encourage question 
authors to submit case-oriented materials that demand application 
o: principles rather than simple recall of isolated facts, the. Board 
developed an item classification guide with instructions that each 
question . submitted for the certifying examination must, bo suitable. 

for meaningful classification with respect to at least four of the 
f ol 1 ovn ng f ive chime ns ions : ** 



I 

IX, 
III , 
IV, 
V. 



Typo of patient (adult or child) 

Type of disorder (trauma or disease, etc.) 
Part, of Body (upper extremity, etc....) 
Basic science (anatomy, etc..,) 

Clinical (diagnosis, etc.) 



Second, utilizing these dimensions, together with a taxonomy of 
intellectual processes, the Board constructed a blueprint*** f or 
the overall examination which stipulates the proportion of the total 
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See 


Chapter 


ITT . 
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Appendix 
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Guide . 
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See 


Appendix 
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for a 
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C'>: t '!;;i hi;;! j c>u to bo dt voted to each { yju- o f. i nl . *1 J t otu'il process and 
lo f c c 1 /'! < gory within each of the V( . dim. ns j on:; listed 

• Third, 1 1 j • * J‘.o;ird o:;l abl :i sb: d a Task i onv on how Multiple 
Choice Q. 1. iyi:,*'- and gave it. t:ho clear ch.-si go lo develop a system 
fo)' coned rue ting and Kubmj It. j ng qaoedi on.*; Inal would assure a Jar- 
f» c ‘ r pj'OpOj‘i;;i c)ii al higher l nxonoml c levels. . 



Vlifdi the development of t he J t cm cl nut ; j f :i c.al::i on guide and the 
blueprint, it: in now possible to direct: staff to pull "cjiientl onn of 
knovn characferi sti es from the i tem pool to fit the Kpeerj fj cationn 
di c l at ed by the blueprint for a given <■ aiii'i.nati cm , Tb:i s initial 
draft, approximately 50/. longer than .ill ultimately be reejuired, 
if; circulated to the entire Board. Kaeh member of the Board is re- 
quo nted to respond to each quest i on , to classify it according to the 
intellectual pi ocess it samples and to comment on its merits* The 
examination commit too of the Board then moots, reviews the responses 
of the boai d and makes the final selection of questions for the ex- 
amination. 



Finally, prior to the administration of the test, the Board 
establishes standards of minimum satisfactory performance.* For the 
multiple choice questions this entails an adaptation of the Node] sky 1 
technique, in which members of the Kxami.nnti.on Committee review each 
quo s t i on an d choc k erieh .opUcm Jj. in t jijki re l,y _ j)a ss i ng _ g a in id :i d ato s h ou 1 d 
be able to eliminate. The reciprocal of the remaining number of 
options is taken as the "minimum passing level" (MFJ,) , for that quest: - 
ion , ^ 1 luts , for example, if a question has five options and two arc 
eliminated , then the 1 chances that a minimally competent candidate 
would get the right answer by guessing, would be one: in three or .33. 
If the whole test: consists of such questions, the barely pausing can- 
didate should score at least 337o. The average of the MPL's for all 
of the questions in an examination is the best estimate of the score 
that: a candidate wguld get if he eliminated all of the alternatives 
that informed judges think he should be able to exclude and selected 
among the others by guessing, Employing this technique, it is pos- 
sible to transform the scores to any scale for combination with other 
test data. In the present study, all scores and sub- scores for both 
oral and written tests were converted to a 12-point scale in which 
the MPL was always defined as 3.5. 



Application of those pro -determined standards in the 1968 Final 
Cci l.i fi cation Examination resulted in a failure rate on the 
Multiple Choice Recall Sub-Score of approximately 50% even among 
graduates of American Medical Schools who wore taking the examin- 



See Appendix 26 for a detailed description of the system of 
setting absolute standards. 




( 



.. 



nlianii. 



miHm 



m- 

al\(hi for ll,o first lime. However , when the sub- sc oj ( s o.i l lie* 
Multiple Choice component were combined with dal a about per J or- 
iianeo oa ol. her t ype'.! of c ;:crc. j re-;; , l be J a:i 3 ore red e fell J subst ; 
i oil >'.<• 

Data on t:bo rcl j nb:i 3 :i ty and validity of tin* m*v7] y revised 
Mu 3 tipi v Choj'v examination ore* reported in Liu* follc>v/3np sections. 



lit. 



Rel j ? *b;i 1 i 1 y of ill': Multi o.i o Choice Kxnnri noti on 



The brent, strength of tbc* multiple cboj.ee technique is its con- 
si. stent J y high reliability: In tests which arc carefully constructed, 

and are composed of quo at ion a that have been widely reviewed , rat imp, 
errors are minimized, and sampling reliability is assured by the fact 
that in the typical examination , it is possible to use large numbers 
of completely independent items. Table 50, ’which summarizes the re- 
liability data on tbc multiple choice examinations used by the ortho- 
paedic profession, over the last, few years, reveals that, the estimated 
reliability of these examinations varies directly with the number of 
items and indicates that sub- scores , based on relatively few items, 
arc not sufficiently reliable to use independently in the certifica- 
tion process. 



V a .1. :i cl :i ty of the Mud tin] e C hoi c e Exa m.i nat ion 
Con tent: Val i di ty 

The Task Force on New Multiple Choice Questions was given the 
charge to develop questions at higher taxonomic levels for the Board 
examinations; the methods it established for constructing and re- 
viewing new questions has resulted in an increased proportion of items 
being rated by authors and reviewers, as sampling higher cognitive 
processes. However, despite these improved procedures, data from the 
questionnaire study of examiner and candidate reaction to the new ex- 
aminations (sec Table 51 and Appendix i g ) indicate that candi- 

dates, successful and unsuccessful alike, found the multiple choice 
xnponent least relevant and least appropriate. They were especially 
critical of many questions which seemed to them to be ambiguous or 
to demand information that is remote to their needs as practitioners. 
Some support, for their views is to be found in the fact that the re- 
latively reliable Multiple Choice Recall toot is loss useful than the 
relatively unreliable orals as a predictor of such criteria ns super- 
vi sor rs 1 r citings . 



* Sec Chapter X_ , below for a detailed account of the scoring 

procedures and results of the 1968 Certification Examination. 
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Given the candidates' criticism of the multiple choice examina- 
tion, and its lack of association with supervisors' ratings it is 
always a little surprising to find, as revealed in Table 52, that 
there is a consistent growth, over the four years of orthopaedic 
residency training, in the abilities measured by the Multiple Choice 
questions. While differences in mean scores for groups at different 
levels of training are in the expected direction on all sub-tests, 
the amount of growth is far from uniform in the various disciplines. 



TABLE 53 



DIFFERENCES IN MEAN SCORES OF FIRST AND FOURTH YEAR RESIDENTS 

1967 IN TRAINING EXAMINATION ' r 



V ~- 



Subtest 



K it^> »<*'*+* ►,** b MM IW 



General Orthopaedics 
Adult Orthopaedics 
Children ' s Orthopaedics 
Trauma 

Hand Surgery 
Anatomy 
Pathology 
Physiology and 
Biochemistry 
Biomechanics 
Rehabilitation 



Total 



Difference in % 


Difference Divided by 
Standard Deviation 


12 .4 


1.3 


11.8 


1.4 


15.9 


1.7 


12.6 


1.4 


16.3 


1.5 


17.3 


1.6 


14.0 


1.5 


5.5 


0.7 


16.4 


1.5 


18.7 


1.3 


13.3 


1.8 



It is interesting to note (see Table 53) that much greater improvement 
occurs in scores on Pathology and Anatomy disciplines which are direct- 
ly applicable to clinical problems in surgical specialties than in 
scores on Physiology and Biochemistry. Such results suggest either 
that the latter content areas are less effectively handled in the train- 
ing programs or alternately, that the examination questions in these 
disciplines are not probing important areas of competence. Perhaps 
both hypotheses have an element of truth. In an extensive review of 
the individual questions that discriminated most between first and 
fourth year residents, Dr, Huncke 2 observed that the most discrim ina t- 
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rng questions were those involving complex multi-system widespread 
entities, such as mcmingonyeloccle, cerebral palsy and scoliosis. He 
adds: "This would seem logical since these are complicated situations 

that do require a considerable amount of training and understanding to 
proceed with any amount of accuracy. It should be noted, however, that 
there were questions involving these entities that, in my opinion, 
could have been adequately answered if the individual correlated basic 
science knowledge which he is presumed to have, particularly his know- 
ledge of functional anatomy. Judging from the results, however, this 
was not often done by the candidates, who seemed to approach these 
particular complex problems as if they demanded recall rather than 
application.' 



Data on a second aspect of construct • validity , i , c . , the relation- 



ship between performance on the Multiple Choice Examination and on other 
other evaluative techniques, are summarized in Tables 54 aid 55. These data 
clearly indicate that sub- scores on the Multiple Choice examination 
behave in the expected manner in relation to other test scores, and 
strongly suggest that the Multiple Choice test measures, primarily, 
cognitive functioning. For example, the score on Multiple Choice- 
Recall has a very lox^ correlation with the rating factor, "Patient 
Relationships", and with scores on the Simulation Orals. As compared 
with the Recall score, that on Multiple Choice-Problem Solving, when 
’Corrected for attenuation, has a generally higher correlation with 
all types of assessment other than the rating factor, "Surgical Technique", 
and certain of the- Written Simulation Scores. These exceptions may -be 
due either to random errors of measurement, or to the fact that both 
the rating factor and the written simulations include important non- 
cognitive factors of temperament and skill which are not sampled by 



multiple choice techniques. Further, it is of interest to note that 



when corrections are made for unreliability the correlation between 
the Multiple Choice-Recall and the Multiple Choice-Problem Solving is 



.89; i.e., about 807 o of the variance in the two tests is common. 



, Finally, additional data on the construct validity of the multiple 
choice technique are furnished in the factor analyses of the 1966 in- 
Training Examination and the 19G8 Final Certifying Examination (see 
Tables 6, 7 , and 56) both of which indicate that the multiple choice 
Scores load heavily on one factor which appears to be a content or 
information factor; in contrast, other techniques show a much more 
complex factor structure. • • 
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TABLE 54 

CORRELATIONS BETWEEN MULTIPLE CHOICE SUBSCORES 
AND OTHER SELECTED VARIABLES 

1968 Final Certification Examination 



„ . _ 


Multiple Choice 


Multiple 


Choice 


Other Variables 


0 

Recall 


• Problem 


Solving 




Actual 


Corrected * 


Actual 


Corrected 


Rat in g Factors 
'Information Gathering 


.25 2/ 


.30 


,•20*7 ( i 


.37 


Problem Solving 


.31 


.37 


. 2 6- . a 5‘ 


.48 


Clinical Judgment 


.2-3 


.27 


.19 


.35 


Surgical Technique 


•16 -tj 


.19 


, 10 , f •■*» 


.19 


Patient Re 1 a t i on s hi p s 


.09 , :> 


.11 


..08- . 


.15 


Cont . Resp. 


.16 


.19 


.12 ( i 


.22 


Emergency Care 


.17 


.20 


.15 ,• - / 
.12 


.28 


Overall Re 1 a t i on s h i p s 


.15' / / 


.18 


.22 


Ethics 


.19 Wb 


.23 


.1-2 ' > 


.22 


Overall Competence 




.33 


. ^4 * L ^ 


.44 


Oral Tests 
Adult 


.27 


.37. 


.20 


.37 


Child 


. 15 


.18 


.07 


,13 


Trauma 


.30 


.36 


.23 


.43 


Interpretive 


.22 


.26 


.21 


.39 


Simulation Attitudes 
Written Simulation 


.12 


.14 


.14 


.26 


Diagnosis Proficiency 


.21 


.25 


.17 


.31 


Treatment Proficiency 


.19 


.23 


.11 


.20 


Total Proficiency 


.26 


.31 


.17 


.31 


• Mu 1 1 i pie Choice 

• Recall Actual 






.41 


.76 


Recall Corrected 


• 


— 


.49 


.89 



4 \ 



Corrected by Guilford technique for attenuation, due 
unreliability of the Multiple Choice Sub-Tests » 





[.’ABLE 55 



CORK El.* AT 1 ON fs BETWEEN MULTIPLE CHOICE SCORES AND 
OTHER SELECTED VARIABLES BY YEAR OE TRAINING 



1966 In-Training Examination 



Other Variables 



Rating, Factor s 
Recall 

Pr o b 1 e in -Solving 
Informat ion- Gathering 
Clinical Judgement 
Patient Relations 
Colleague Relations 
Surgical Skill 
Ethics 
Overall 



Other Test Scores : 



Diagnostic Interview 
Proposed Treatment 
Interview 
Traditional Adult 
Oral 

Written Simulations 
Total Proficiency 



rst and 
cond Years 
109 
tual 


Third and 
Fourth Years 
N=d.l 9 
Actual 


Total All Four Years 1 

N-228 . 1 


Actual 


Corrected for Unre- 1 

liability on MC Section. 1 ! 


.25 


.29 


.31 


.33 I 


.23 


.25 


.26 


.28 f 


.19 


. 43 


.33 


.35 I 


.23 


. 24 


.26 


*28 | 


.13 


.09 


.19 


.20 1 


.09 


.09 


.07 


.07 I 


.13 


.26 


.14 


.15 1 


.17 


.29 


.18 


*19 || 


.20 


.26 


.26 


.28 * 1 


.17 


.20 


.26 


•28 ‘ © I 


.28 


.23 


.27 


.29 . 1 


.23 


.36 


. 44 


.47 1 


• 05 


.18 


.01 


.01 1 



Concurrent Validity 



The two major studies of the concurrent validity of the multiple 
choice technique were conducted on the 1966 In-Training and the 1968 
Certification Examinations. Despite the fact that the correlations 
reported in Tables 54 and 55, between scores on the multiple choice test 
ifid supervisors' ratings or other test scores are generally rather low, 
differences among the several values are in the expected direction, 
specifically , the same general pattern of relationships characterizes 
hoth the 1968 certification and the 1966 In-Training data; this gen- 
al pattern is one, in which, despite validity and reliability prob- 
ms in the ratings that tend to depress all correlations in the matrixl 
e multiple choice scores are significantly more closely related to 
fitings of cognitive components of competence than to ratings of skills 
affect* Similar patterns characterize the relationships between 
- on multiple choice tests and a c or e s m other wr i t t in and or a 1 



■L 




• Score 



HI 



I 



These data 
shifts from one 
in ter correlation 



indj cci 

rating f 
s among 



to that the pattern of teat score predictors 
actor to another despite the fact that the 
ratings on the several factors Is quite high. 



TABLE 56 



RESULTS OF MULTIPLE 



CORRELATIONAL ANALYS I S 



1968 CERTIFYING EXAMINATION 



N*=391 



Rating Factors as 
Dependent Variables 


Multiple 
R F 


Test Scores as 

Independent 

Variables 


Partial 
r » 


Information 

Gathering 

* 


.36 5.13 


Multiple Choice Recall 
Observation and Inter- 
pret at: 1 on Interpretati on 
Tr a urn a - Pr ob 1 cm S o 1 v in g 


.12 5 . 56* 

.11 • 4.48* 
.10 3.84 


Overall 

Competence 

• 

* . , 


.36 5.13 

• > 


Observation and Inter- 
pret a t ID n Interpretati on 
Multiple Choice Recall 
Multiple Choice Prob- 
lem Solving 


.12 5.48* 

.12. 5.46* 

.09 3,13 



* Significant at: .05 level of confidence 

** Significant at .01 level of confidence 



m regards the concurrent validity of the Multiple Choice Test, it is 
important to note that it is the best predictor of ratings on "Infor- 
mation Gathering , " the second best for ratings of "Problem Solving " 
the third best for ratings on "Clinical Judgment" and disappears as 
important predictor for rating factors related to affective behav- 



ior , i 
(Table 



5( 



Patient Relationships , " and "Colleague Eilatfoniliip® 



ft 
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3.27 



Secondly , when Appropriate adjustments nre made for dif lerenc.es In the 
i c.’l.i abi. 1 1 t.i,C’ f> of candidate and resident ratings , It appears Clint’ the 
Cert if 3 cation Examination Identil 1 es a much larger proportion of the 
true variance in the criterion darn than does the In-Training Examina- 
tion (Table 57), 




TABLE 57 

COMPARISON OF T3IE PREDICTORS OF OVERALL COMPETENCE 
I960 FINAL CERTIFYING EXAMINATION ANJ) 

1966 IN-TRAINING EXAMINATION 



Examination 



1966 In-Training 
Examination- 

1968 Final Certify- 
ing Examination 



Reliability 
of Rating 
Factor 


Multiple 

R 

. (All Test Scores) 


r — _ T . .. .~t_ 

Multiple R 
Corrected for 
Unreliability 
of Rating Factor 


.73 

.31 

» 


.32 

* 

. 34 . 


.38 

0 

.65 



This is probably attributable, in part, to improvements between 1966 
and 1968 in the Multiple Choice section of the examination and specifi- 
cally to the development of the problem solving subtest in the Multiple 
Choice format, as well as to revisions and extensions of the oral test 
techniques . 

Summary Comm ent 

* In summary, the multiple choice technique, as modified in the current 
Study, provides valuable information on certain facets of competence in 
orthopaedic surgery, as these are defined by supervisors' ratings. How- 
ever, it is necessary to supplement this method of assessment with other 
techniques , in order to obtain valid and reliable data on all aspects of 
competence in the specialty. Such supplementation is of special value 
in evaluating those areas of competence that involve affective behavior. - 

1 Htdelsky, Leo, "Absolute Grading Standars for Objective Tests", 

Educational and psychological Measurement, vol. 14, Spring, 1954 
pp. 3-19 . 

% Hun eke, Brian H.# M,D.# Memorandum# "Review of Discriminating 

Questions", in the November# 1967 In-Training Examination# May 14# 

N.P, 
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CHAPTER IX: 

ft MODEL FOR THE EVALUATION OF COMPETENCE*- ~ 
ft SYNTHESIS OF THE NEW TECHNIQUES 

ev .,„,,., / In tho earlier chapters in this section each of the 

Training stndv qU ? deVel0p0d £oK uoa in «» Orthopedic 
po"?blo noL di8CU * 6ed ®oparately in , 0 for as that was 

adequately I .'r° ev T u f ion technique can be considered 
ev I i-M, y • olatxon, m developing a rational system for 
evaluating complex professional behavior it is neceLarv to 
consider the contribution which each possible- technique male. 

^is cha a oL UraCy T c ° mPletenCSS 0f overall aasZLZ 

teVrelationsh^s ntZ <levotea to an analysis of the in- 

prediction of T several examination techniques in the 

prediction of overall competence In Orthopedic Surgery. 

Malytic.,. va. Sy nthetic Techniq ue a 

as such A ?t r T e ? e f tS mercly a sam P le o£ Savior, 

of behavior St is tho 3 "?? ? ,er t0 ,ampl0 80ns sraan ^pect 
total nl? l , th0ught to ba an important component of 

to sLpli whoU^fi (e i‘T mU8Cle coordinati ons) or, alternatively, 

the ton t h i H 8 y a glV2n slioe of total behavior (e.g. 
the 100-yard dash). The diagram below Indicates the location of 

ne several approaches employed in the Orthopedic study, on a 



most anatvHc la ?r m EUggeSts ' tbe multiple choice technique is the . 
a j , y * ' xt r ®pr esent s an attempt to break total behavior down 

an V 1 COmp0nent pnrts ana to sample each bit separately ' g U ch 

yt ! “«“«•?• t? n l to be highly reliable, since with them U is ' 
in n ) ° bta T tn^ipendant measures of many small bits of behavior 
- a 'lvely short period of time. However, some behavioral traits 

in™ VaUdly mGaSUred by analyti0 techniques; for example reason 
c 1(3 mo< 3erate; in others, decisiveness in taking radical 

an^different^tvnes i^rcorrelations LtlL 

. , ' . . , 1 L yP G& of oral or off written problems is, in part att-ribn^ 

: ° 18 faCt that problems require different professional 
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i Analytic* Methods 



V 



Multiple Oral 
Choice* Defense 



Therapy 



Oral 

Inter- 

pretive 

Skills 



Oral 

Diagnostic, 

Emergency 

Treatment, 

and 

Complication 

Problems 



Written 
Simula- 
tion of, 
Treat- 
ment 



Synthetic Methods 



Written 
Simula- 
tion of 



nos is 



Simulation 

Orals 



qualities; hence the synthetic exercises tend to be more valid, but rees^ 

reliable. It is for this reason that a variety of examination techniques 

ranging from analytic to synthetic has been included in the orthopaedic 
certification examination. , unopaeaic 
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Coiitr ;j_bu tj on of _ Fa cl on J'loJJiod 
_to the Froclietiou^of Overall Compelence 



One might: ask, however , if all the modes of examining thnL were 
approved for inclusion in the regular certification process arc really 
necessary; i.e., does each make some contribution to the prediction 
of competence? The multiple regression study of the 1968 Certification 
Examination is reassuring on this point . The results,* summarized in 
Table 58A and 5813, indicate that every technique makes some additional 
contribution to the prediction of Overall Competence as that is defined 
by supervisors' ratings. Indeed, if the Multiple Choice technique 
(which has the highest simple correlation with supervisors ' ratings of 
"Overall Competence"), is taken as the starting point, the addition of 
the score on Interpretive skill from the oral examinations and that on 
proficiency in the written simulations, increases by about 50% the 
amount of common variance between test scores and supervisors' ratings. 



However, it may be argued that even with this increase the value of 
the combined tests as predictors of overall competence is exceedingly 
limited since they account for only about 12% of the variance in those 
ratings. This criticism would have considerable merit if the purpose 
of the test were to predict the criterion scores. However, such is not 
the case; their purpose is to identify lack of competence. For this 
purpose it may be argued that , given the amount of error variance in • 
the ratings (reliability ~ .31) and the amount of true variance assign- 
able to factors (e.g., surgical skill) none of the exercises was de- 
signed to measure, the data strongly suggest that the combined battery 
of tests predicts about as much of the remaining true variance in 
overall competence as could reasonably be expected from the limited sample 
of behavior it is possible to collect in a 6-7 hour examination situation. 



Factor Structure of the Test Battery 



A second approach to the determination of the value of multiple 
assessment techniques consists in the analysis of the factor structure 
of the combined battery of tests and rating scales incorporated in the 
certification process. In this context, the factor analytic studies 
made of the 1966 In-Training Examination and the 1968 Certification 
Examination may be briefly reconsidered here. In the earlier study 
the three following clearly identifier, le factors emerged.** 



Vv 

** 



See also Appendix 15 
See also Table 6 . 
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Factor I 



Factor IT. 



A cognitive factor, probably predominantly "recall'' 
behavior, on which the traditional Multiple Choice 
and oral examinations loaded hc-avily, and certain 
scores on the simulated treatment Interview and the 
staff conference had moderate loadings. 

A reasoning factor, probably predominantly "persis- 
tence" in inductive inquiry, on which the written 
simulations of diagnostic problems loaded heavily 
and the score on the multiple choice test had mod- 
erate loadings. 



Factor III A style or temperament factor, probably involving 

decisiveness, on which the simulation orals loaded 
most heavily in a ne gativ e direction and written 
simulations of treatment problems had moderate 
positive lo. clings . 



The factor analytic study of the 1968 Certification Examination 
revealed a similar, but substantially more complex factor structure 
in which 5 factors emerged. One of these additional factors is almost^ 
certainly attributable to the inclusion of supervisors' ratings and 
the second is probably due to the inclusion of certain sub-test scores , 
the most important of which was the subscore on the written simulations 
representing skill in avoiding harmful procedures . With the incorpor- 
ation of these additional measures in the analysis, the following 5 
factors were identified:* 



Factor I A general ability factor, on which all ratings have 

relatively high loadings. It is interesting that 
among the test scores , the score on Interpretive 
Skill from the Observation and Interpretation Oral 
has the highest loading on this factor, suggesting 
that the confrontations in which chiefs make judg- 
ments about residents are often focused around dis- 
cussion of x-rays and other diagnostic tests or 
clinical findings. 






a 




See Table 7, 
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Factor XI An informal: Ion or content: factor (similar to 

Factor 1 in the 1966 study) . 

Factor 111 A factor related to inductive reasoning, (similar 

to Factor 11 in the 1966 study) on which the scores 
ori selection of useful procedures in the written 
simulations and on the Observation and Interpretive 
orals have heavy loadings. 



Factor IV A factor of skill in oral communication in which 

the Simulation Orals have heavy loadings and all 
orals have at least a moderate loading. 

Factor V A factor related to decisiveness and efficiency 

(similar to Factor III in the 1966 study) on which 
scores on avoiding harmful intervention on the 
written simulation has high p ositive loadings and 
that on selecting indicated procedures has high 
negative loadings . 



It is also worth noting that, for the most: part, each of the 
tests included in the 1968 Certification Examination and each of the 
f ) major scores derived from it had moderate to heavy loadings on several 
factors. For example, the score on Interpretive Skill derived from the 
oral examinations showed at least moderate loadings on all 5 of the 
factors; that on Written Simulations of diagnostic problems on 3 
factors, those on the problem-solving orals in adult orthopaedics 
and in trauma on 2 factors. This factorial complexity of the 1968 
examination is especially significant in view of the general tendency 
for different techniques to emerge as independent factors, partly be- 
cause each type of test samples some technique-specific abilities 
(i.e., persons with high verbal facility perform well on oral examin- 
ations) and partly because with the inclusion of sub-scores in the 
analyses (as in the 1968 study) the halo effect (particularly in the 
supervisors' ratings and oral examination scores) so increases the 
correlation between related sub- scores that there is a strong tendency 
• for each set of 4 sub-scores to cluster around separate independent 
factors . 
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Sum mary Common l 

f fhe delta pr osonted .in tills and pro cedi ng chapters strongly suggest' 
that c.’ompc fence in or thopaedies is multi. factori al and that ci variety of 
techniques is required to provide valid and comprehensive information 
on candidates applying for certification. Those data also highlight 
the necessity of developing a system for scoring and reporting test results 
that takes full cognizance of the philosophical, psychological and psycho- 
metric issues discussed above and which is, at the same time, practical, 
feasible and. acceptable to both the candidates and the Board. The method 
which was adopted— a profile system— is described in the following chapter 



J'j 



CHATTER X: 



THE AmJ.CATION 01 ’ THE PROFILE TECHNIQUE 
TO THE PROBLEMS OF CERTIFICATION 



C riter ia for De signing the Syjj tom 



A profile technique of summarizing and reporting the results of 
the examination system described in the preceding sections was do** 
ve loped , This system was designed to meet' three criteria which are 
often violated in traditional methods of utilizing test data for 
purposes of certification . These criteria can be summarized as follows 

(1) Competence in orthopaedic surgery is multifactorial 
in nature; it therefore follows, that strength in 
one area cannot be allowed to compensate completely 
for weakness in another. 

# 

(2) The unit in terms of which competence is assessed 
should be based on performance factors, not individ- 
ual tests; only when each #f test technique measures 

a different trait , should scores on individual tests 
be considered separately. 

(3) Ideally, the level of satisfactory performance on a 
certification examination should be determined prior 
to its administration and should be based on absolute 
standards not on relative standing in the distribution 
of candidate scores. 



- ... Procedures in Implementing the System 

In order to meet these criteria the following four performance 
factors* were identified as the units in terms of which certification 
decisions were to be made on the 1968 Orthopaedic Certification 
Examination: Recall of factual data, Analysis and interpretation of 

clinical data, Problem-solving ability and Attitudes toward patients 
and colleagues. 



See Table 31 
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Sub-score?; on each factor wore derived from on eh of the oral 
and written ten ;!: n included in the 1 968 test battery. These sub- 
scores wore converted to a 1 2~poJ nt scale with 3.5 defined as the 
minimum passing level. Sul)- scores, weighted as shown in Table 59, 
were combined to obtain the four factor scores and the total score, 
from which a profile similar to that: shown in Table 60 was derived * 
for each candidate. 



Prior to the administration of the examination, the following 
tentativ e guidelines for determining certification were adopted: 

1. All candidates scoring below 3.5 (the Failing area) 
on Problem Solving, Interpretation, or Recall should 
NOT be certified. 

2. All candidates scoring between 3.5 and 6.4 (the Margin- 
al area) on the TOTAL should be reviewed and ground 
rules established for the disposition of each case. 

3. All candidates scoring above 6.4 (Good or Excellent) 

, on the TOTAL and above 3.4 on every factor should 

be certified. 

In adopting these guidelines for use with a profile system of 
scoring the Board formally recognized the multifactorial nature of 
orthopaedic competence, since the four performance factors rep 
sent ed a distillation of the 94 components of competence derived 
in the critical incident study and since the Board continued to re- 
quire (as previously) that each candidate submit data attesting to 
his surgical, skill and professional ethics as a condition for ad- 
mission, to the certifying examination. (Ratings on The Candidate 
Evaluation Forms required for each applicant represented an attempt 
to systematize 'the collection of data on the latter qualities.) 



Second, the derivation of scores on each factor from a variety of 
techniques assured the maximum reliability of each factor score. 
While a . score on an individual examination rnay be so unreliable * as 
to be virtually meaningless, a low score derived across a number of 
techniques offers reasonable certainty that the individual is in- 
adequate on that performance factor. Third, by relying on a pre- 
established Minimum Passing Level” and pre-determined ground rules 
•the Board assured that "pass- fail" decisions would be based on 
absolute standards rather than arbitrary decisions. 
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In January, 1968 the J first certification cwomuinatlon to which 
thjs profile scoring system would be applied was admin j stored to 838 
candidates of whom 54 were retaking only a segment of the examination. 

Ibe nature of the remaining candidate population is reported in Table 
61 below: 



TABLE 61 



CANDIDATE POPULATION 







'Graduates of 
U. S. Medical 
Schools 


Graduates 
of foreign 
medical 
schools 


Total 


Admitted to final, 
certifying exam 
without prior ex- 
amination (due to 
change in rules of 
eligibility) 


54 


22 


76 

i 


Previous examina- 
tion experience 
but no prior exam- 
ination failure 


503 


27 


530 


Previous 

examination 

failures 


128 


50 


178 



Scores and sub-scores on each test and each factor (Recall 
Interpretation, Problem-Solving and Attitudes) were computed and 
profiles drawn for each individual. Univariate statistics were 
derived for the total population and for the various sub-groups 
described above. These results are reported in Tables 62 and 63 
The interrelations among scores and sub-scores and between them 
and the multifactorial rating of candidates by their chiefs was 
analyzed by correlational and multiple regression techniques 
iables 64 and 65 below present the results of that investigation. 



Failing Marginal Good Excellent 



14 3 



^lab'J.e 62, below, reports the re, snips of performance on the total 
exaurinatj. on and on each far: Lor lor the over 500 graduater. of U. S. modv- 

real schools v/bo wore taking the cert 3 f ication examine Li on for the fin 
time. 

TABLE 62 



DISTRIBUTION OF SCORES ON 1968 FINAL CERTIFICATION 
EXAMINATION FOR U.S. GRADUATES WHO WERE TAKING 
THE FINAL CERTIFICATION EXAMINATION FOR THE FIRST TIME 



Scores 


Recall 

• 


Interpretation 


Problem- Solving 


Attitudes 


total 




N 


N 


N 


N 


N 


c 11.5 
* 11.0 

■3 10 -5 

o 10.0 
w 9.5 


1 100.0 

0 . 

3 99.8 

5 99.3 

6 98.4 


4 100.0 

9 99.3 

29 97.7 


1 100.0 

8 99.8 


7 100.0 
16 98.8 
56 96 . 0 
49 86.3 
54 77.7 


6 100.0 


9.0 
8.5 

8.0 

•o 7 ‘ 5 

8 l-° K 

o 6.5 


9 97.3 

18 95.8 

6 92.7 

29 91.7 

35 86.6 

29 80.5 


61 92.7 
70 82.1 
80 69.9 
79 . 56.0 
63 42.3 
68 31.3 


35 98.4 

44 92.3 

91 84.7 

105 68.9 

111 50.6 

70 31.3 


71 68.3 
64 56 . 0 
58 44.9 
52 34.8 
47 25.7 
34 17.6 


5 <39.0 

27 08.1 
46 93.4 

84 85.4 

98 70.8 

93 53.7 


6.0 
t— t 5,5 

g 5.0 

•rl 4 . S 

50 / 'rs 

ft 4.0 
£ 3.5 


39 75.5 
45 68.7 
48 60.9 
68 52.5 
125 40.7 
67 20.0 


53 19.5 

20 10.3 

13 6.8 

15 4.5 

3 1.9 

1 0.5 


L^iruiu 

52 19.1 

31 10.1 

16 4.7 

8 1.9 

3 0.5 

■— . r i ,, , - ■ 


16 11.7 

17 8.9 

14 5.9 

17 3.5 

3 0.5 


103 37.6 
55 19.7 
35 10.1 
15 ’ 5.0 

6 1.4 
2 0.3 


3.0 
g> 2.5 
•h 2.0 

1.5 

TO i rx 

fL 1.0 


32 7.3 

5 1.7 

3 0.9 

1 0.3 

1 0.2 


2 0.3 

0 
0 
0 
0 






* 

\ 


Mean 


5.15 


7.42 

. - ... 


7.18 


8.31 


6.63 
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After reviewing the data on the 
:in Table 62 the Lent alive guidelines 
the foil ow i n g m 1 n o r m o d i f i c a l: i on a : 



d 'j. s t r i bu l: ion of sc or a s s hown 
1 i s t e d above were accept e d w i t h 




was decided that a failing score on this factor alone 
would not be sufficient cause to withhold certification. 
This change affected only 2 candidates. 

2, It was decided to certify everyone whose Total score 
was clearly satisfactory, i.e., 6.5 or above. This 
decision was made on the ground that the Total score 
was significantly more reliable than scores on the 
independent performance factors. This decision resulted 
in. the certification of one candidate who otherwise 
might have fa i 1 e d . 



In addition the following guidelines were developed for dealing 
with the 38% "marginal" candidates in the normative group: 



1. Certification would be withheld from any candidate 
whose scores were "Marginal" on the Total AND on any 3 
of the 4 performance factors. 

2. Certification would be withheld from any candidate 
whose scores were marginal on BOTH Recall and Problem- 
Solving . 



In arriving at these decisions, the American Board of Orthopaedic 
Surgery took cognizance of the fact that 76% of the candidates scored 
at marginal or failing levels on the Recall factor and 20% scored at 
these levels on Problem-Solving. While these results may iraise some 
doubt about the appropriateness of the pre-determined standards on 
Recall, they are compatible with the view that the store of information 
readily accessible to a large number of candidates is marginal for 
opJiimal„ orthopaedic practice, and that in such cases certification 
should be awardee only to those whose problem-solving and other skills 
ate Good or Excellent." In short, the Board adhered closely to the 
guidelines promulgated BEFORE the examination and, for the first time, 
implemented a system based on absolute, rather than relative standards. 
Ihe effects of the adoption of these guidelines on the various sub- 
groups within the candidate population are shown in Table 63 below. 












. 
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TABLE 63 

3ENT FAILURE RATE AMONG DIFFERENT CANDIDATE POPULATIONS 



r 





Graduates 
of U. S. 
Medical 
Schools 


Foreign 

Graduates 


Total 


No previous 
examination 
experience 


24 


59 


34 


No previous 
examination 
failures 


1 

18 


48 


19 


Previous 
examination 
failures ’ j 


59 


80 


65 

* r\ 


Total 

J 


26 

L_ 


67 


j ' . v_/ 

31 

_____ 






1/M 



In a system such as this in \ ?h j eh decisions ai\. made on the 
basis of pooled judgments of numerous observers , each sampling (ns 
objectively as possible) a "bit" of candidate behavior it is im- 
portant to consider the interrelations among examination scores and 
between them and the ratings made by senior staff who are familiar 
with the candidate. Table 64 summarizes such data. 



TABLE 64 

INTERCORRELATIONS OF EXAMINATION SCORES 
AND RATING FACTORS 



(N=391) 





[.II nnr""rm..T„m .rr / r. ,nrm ,i„r l. .1 li r._ . T .. r r , -J~ ~ — — -1 




Examination Scores 








Problem 








j Recall 


In t e r pr e t a t ion 


Solving 


Attitudes 


Total 


Exam in a t i on , S c or e s 












Recall 

J Xnterpretat i on 


X 










.42 


X 








Problem- Solving 


.48 


.69 


X 






Attitudes 


.22 


.46 


.40 


X 




Total 


.84 


.77 


.83 


.51 

i 


X 

i 


Rating Factors: 












Problem-Solving 


.32 


.36 


.30 


.21 


.39 


Patient Relations 


.13 


.22 


.17 


.17 


.21 


Overall Competence 


.29 


.38 


.29 


.17 


.35 

- j 



II 



As might be expected, these data indicate that within the exam- 
ination, scores on "Interpretation" and "Problem Solving" are most 
closely related to each other, and those on "Recall" and "Attitudes 
are least so. Secondly, though the correlation between the examina- 
tion scores and training chief’s ratings are generally low (due in 
part to unreliability, especially of the latter) the data reveal a 
significantly higher correlation between the chief's rating of can- 
didate’s problem- solving skills and the examiner's rating of the 
cognitive components of competence, than between the chief's and the 
examiner's assessment of the affective components of competence. 
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ly ’ 11 « hou V , bc ' notod th;,f thc score on Interpretation is more 
( i l ] J -c'My con c la Led than any other examination factor with the chief's 
rating of Overall Competence; a result that probably reflects the 
fact, noted earlier, that many of the staff-resident encounters are 
concerned with the interpretation of X-Ray and other diagnostic studies. 

. Final!) the results of the multiple correlational analysis using 
examination rformance factors as independent variables (see Table 65 , 
provide further evidence as to the validity of the profile technique of 
scoring and reporting. The data indicate that p single score by itself 
is not sufficient to describe competence. Second, they suggest that 
-he components of competence measured by the examination scores on 
Recall and interpretation" arc about equally decisive, and that 
\ ar fi I) m °, r f lra P°rtant than the components of competence, measured 
b> the l roblem- Solving and Attitude" scores, in contributing to 
the chief s judgment of overall competence and of many of the cognitive 
skills which he rated. This finding is, in itself, significant in 
on&iceiing the basis on which training chiefs evaluate resident com- 
pe .erica . Ihircl, the aspect of competence measured by the "Attitude " 
score contributes significantly only to the prediction of the chief's 

“ «f,° f E 5 fe , CtlV ® neSS in Patient Relations . " Finally, the empirical 
weightings. of the various scores (multiple regression equation) are 

/ -\ . , ”] 0 , st P art > similar to the pre-assignod weights (Table 59) 

( )which had been developed on purely logical criteria for purposes of 

computing the factor scores and the composite total score on the 1968 
Certifying Examination. 



Summary Comment 

, ., Th ® profile system of scoring and reporting test data, as develope< 

by the American Board of Orthopedic Surgery for use in its 1968 examine- 
rprH?* 08 ^ constituted a major innovation in procedures for specialty 
n C !i tiOT1 ' .Among the most important characteristics of this system 
(1) the provisions developed for obtaining appropriately weighted 
pooled judgments derived from a number of sources about each ma 'or 
performance factor that contributes to competence, (2) the methods 
developed for determining and applying absolute standards in judging 
performance, (3) the nature of the feedback the system provides to ^ ’ 

candidates, the training chiefs and the Board and (4) the self- 
corrective mechanisms which this feedback system stimulates. ’ 






i wmrn 



147 



SECTION THREE 
CURRENT STATUS 



AND 



PROJECTED NEXT STEPS 



CtJAP'J'KH XI : 



OUT COM JOS Cl-’ TJJJO 



OIWIOJ’AKDIC TRAINING STUDY 



To provide a context in terms of which the major outcomes of the 
4-year Orthopedic Training Study can be summarized and evaluated it 
would be well to recall briefly both the immediate objectives and the 
long-range goals for which that research project was designed. Since 
the study was initiated as a direct consequence the continuing concern 
of the American Board of Orthopoxlb. Surgery for- systematic improvement 
of i ts certification procedures, the development of more valid and more 
reliable techniques of assessing professional competence in orthopedics 
constituted the immediate res earc h aim . This, in turn, entailed the 
development both of a methodology and of specific instruments for the 
three-fbld purp>ose (a) of defining professional competence in opera- 
tional terms, (b) of analyzing existing certification techniques and 
(c) of constructing and validating more appropriate ones. Second it 
was clear from the outset of the study that the problems faced, by the 
American Board of Orthopedic Surgery are not unique to it; they are 
common to all groups responsible for setting professional standards 
and are especially urgent in all of the health fields where rapid 
scientific advances combined with increased popular demand for 
O more and better professional services have excerbated the problems 
of setting and maintaining standards. Thus, the development of a 
model for professional self study became ‘an Int e rmediate goal of 

s tudy . Finally, evaluation was viewed by both the research team 

and the Board as an integral part of the educational ’process and thus, 
as a prerequisite for increasing the efficiency and enhancing the 
effectiveness of professional training, these being the long-term 
objective s of a project designed to contribute ultimately to the 
better utilization of scarce manpower resources. 



Immediate Outcomes 

Materials 

The specific instruments developed in this project, together with 
the findings regarding each, have been described in detail in previous 
chapters; here they may be briefly reviewed under the following head- 
ing: rating forms, tests and test manuals, forms for profile scoring 
and .reporting of test results and observational forms for process 
analysis of tests. 




In summary, two types of ratin g f orms were developed: one for 

in scoring oral or practical examinations; the other, for recording 




V 



■■ ■ «... p W .r 



,^.45‘t* . .<***• ■**.» .^-^fsigp 




14 C J 



r 



assessments of habitual behavior, the latter developed to obtain 
evidence on tho.se skills and attitudes that cannot be adequately 
sampled in limi ted "tent 11 situations, as well as to obtain data for 
studies of the concurrent validity of other instruments. Irrespective 
of the purpose for which they were devised, the setting in which they 
were used or the group to whom they were applied, all rating forms re- 
tained for incorporat-ion in the regular coirt.if i cation procedure, shared 
three important characteristics: (a) they specified distinct aspects 

(factors) of performance which the observer was ashed to rate; (b) they 
untilized a 12-point scale on which points were grouped into four levels 
of performance with each level (usually) defined in terms of absolute, 
rather than relative, standcirds of performance; and (c) either on the 
rating scale itself, or in an accompanying manual of instructions, each 
factoD:, was operationally defined and the behavior representing each end 
of the scale was described and illustrated' concretely. These three 
characteristics of .the 3rat.ing forms are regarded as of primary importance 



The specific test materials developed for use in this study includ- 
ed from 1 to 5 forms of standardized written and oral exercises of a 
variety of types incorporated on the in-training and certification 
examinations scheduled during the research period. However, unlike 
most other research of the same nature, the actual instruments are 
probably of less significance than the new techniques and associated 
manuals devised du3:.ing the course of the study. Among these, the ones /"""v 
of greatest long-run value appear to be the following: Written Simula-'* 

tions of both diagnostic and theraputic problems in patient management 
requiring sequential analysis and decision, oral simulations of physician- 
patient and physician-colleague encountej~s , oral exercises sampling 
interpretive and problem-solving skills, the latter as applied to dia- 
gnostic and treatment problems in both emergency and chronic disease 
situations. All of these new techniques share 3 common characteristics: 
first, each was devised to sample a clearly defined segment of competence 
identified in the critical incident study as one of the requisites of 
effective professional performance; second, Whether in oral or written 
format all exercises are based on standardized case materials presented in 
a manner that elicits as directly as possible the behavior each was 
designed to sample; third, methods of recording (and/or observing) the 
examinee’s responses to the test situation are such as to afford rea- 
sonable objectivity, in scoring and to facilitate the application of pre- 
determined and explicitly defined standards in judging the candidates’ 
performance. Finally, for each of the new test techniques a brief manual 
has been prepared outlining procedures for developing appropriate test 
materials, for administering and scoring that type" of exercise, and 

setting minimal acceptable standards and interpreting performance on it 



!S 



As with the test materials, the specific forms and procedure: 
developed for profile scoring and reporting of individual candidate 
performance in orthopedic surgery, while available, are probably of less 
generalized significance than the overall' rationale and methodology 
that was developed as a basis foir deriving the specific forms. The 
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y y advantage s of tbc system as such appear to be threefold: (1) 

einbj n.i.ng numerous "bits" of information obtained from samples of 
idal.c pen. f 01 mane e in a variety of types of settings the composite 
It: is maximally reliable; (2) since each point of the profile is 
d on behavioral factors (e.g. interpretive skill) rather than 
techniques (c’»g. score on multiple choice questions) the ^picture 11 
ompetence yielded by the profile corresponds more closely to operational 
nitrons of competence employed by training chiefs and colleagues in 
unting physician performance than does the picture which emerges from 
conventional scoring techniques; (3) finally, the nature of the 
l.lc provided on each candidate greatly facilitates the application 
ppropriate standards despite the great variety in patterns of pro- 
i.onal competence characteristic of any population of applicants. 



In addition to the specific instrument's developed for use in 
£jSoc {.sing individuals , the two observational forms developed for 
esc. j J.blng and evaluating the tests themselves deserve special mention. 
I’he somewhat different forms utilized in the studies of the traditional 
due (he new type oral examinations shared two important characteristics: 
(1) insofar as possible the observer was required, to identify the nature 
of candidate (and/or examiner) behavior elicited during the test period; 
he was not asked to evaluate the "quality" of the examination? (2) the 
observer Was instructed to make such a descriptive recording for each 
\mit of behavior rather than to furnish an account based on overall 
Impressions . This technique of classifying and recording bits of in- 
formation appeared to maximize the objectivity and the reliability of 
the observations while furnishing the basic data essential for a sub- 
sequent qualitative judgment regarding the content validity of each 
type of oral. 



Fin d v n gs 

No attempt will be made here to provide a comprehensive summary 
of the specific results discussed fully in earlier chapters; at this 
point it would seem most appropriate merely to outline briefly the 
major trends observed with respect to each of the. following general 
categories of findings: (1) those concerned with the reliability and 

validity of each of the new test techniques; (2) those concerned with 
variations in patterns of performance associated with age, level of 
education and experience, nature of practice setting and the like 
(obtained from cross-sectional studies of different population samples); 

(3) those concerned with changes in professional achievement associa- 
ted with increased education and /or experience (as obtained from longi- 
tudinal studies of the same population sample). 



Studies of the reli ability of the various measures indicate that 
reasonably lengthy multiple choice examinations and written simulations 
Of patient management problems are highly reliable measures when re- 



3 01 



1 d ability is defined degree of internal connj r.f.oncy ? however, there 

Vi,r - laLlon . in approach to different types of patient wan- 
agomenu problems to 3 ndi cate that a number of problem.-., both diagnostic^* 
and therapeutic, ranging from emergency to comprehensive care situations, 
and sampling a number of clinical entities should be included in one 
examination in order to generalise the res ults to a universe of varied 
cl mica] problems. As regards the oral examinations, all achieve a 
level of in^rratqr reliability sufficient to justify their inclusion 
i a a battor Y of te sts, provided single scores from individual orals 
are not used independently to determine passing and failing. Further 
ic traditional orals, as well as the now simulation and interpretive orals 
reach acceptable levels of sampUng reliability provided scores are not 
-rented independently. However, sampling reliability of the problem- 
oO .vj.ng oj.als is sufficient..! y lower to suggest the necessity of using 
several cases even when this examination is employed as part of a test 

w ci u uor V 0 

findings with respect to validity of the various techniques are 
somewhat more difficult to summarize: Content validit y, studied by 

both process and observational analysis, was judged to be significantly 
hignor for all of the newer techniques than for the more conventional 
"eciniques. Construc t ... va^ djhby was studied by exploring the congruence 
between hypotheses about the performance of groups at various levels of 
training and experience, and their actual performance on a given test. r\ 
As might be expected, these studies indicate that despite essentially ^ 
complete overlap in the range of scores ftfr groups at different levels 
oi- training, mean scores on most tests differ in the expected direction 
Amount of: training is most highly correlated with performance on tests 
" a measure general orthopaedic information or decisiveness about 
* erQ Py an ^ least correlated with scores on tests designed to assess 
thoroughness of diagnostic work-up or the ability to relate to patients 
These results are consistent with other information about the general 
nature and relative emphasis that characterize most training programs. 
Further, studies of the influence of experience and type of practice on 
responses to the written simulations of patient management problems 
reveal the same types of relationships as described in observational 
studies of practitioner performance.. Concurrent validity of the various 
measures was investigated through correlational and factor analytic studies 
of the interrelation among scores on different types of examinations 
and between them and supervisor's ratings of performance. These studies 
revealed that there was considerable overlap in the conventional written 
and oral examinations and that the newer techniques appeared to measure 
aspects of competence not previously sampled. The intercorrelations of 
scores on the new techniques • (when corrected for attenuation) yielded a 
factor structure compatible with hypotheses as to the interrelationship 
among aspects of competence each was designed to measure. However, 
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correlations between tent score.') tind super vi nor ' s ratings were often 
- cti n appoint ly low, though the patterns of those correlations matrices 
were an predicted when duo allowance was made for the sometimes ex- 
ceedingly low reliabilities and often very large errors due to "halo" 
effects that characterized the ratings. The pr ed icti ve yal.i dity of 
the newer certification techniques is to be investigated in the 
proponed ten year follow-up study outlined below. 

Among the findings in regard to variations in performance associated 
with age, level of training, practice setting and like, the following 
are of greatest signif iconcc : the relatively slight differences found 
between groups at different levels of training on several of the achieve- 
ment measures, the consistent tendency for performance of practitioners 
on the written simulations to decline with age and with remote nesss of 
affiliation with a teaching institution, and the striking diminution in 
diagnostic thoroughness associated with increased amounts of training 
and experience. 

Finally, delta are now available from repeated administrations at 
one year intervals of parallel types of achievement tests to a popula- 
tion on which substantial amounts of biographical information are also 
available. These data are now being analyzed as a preliminary step in 
a newly iriitiated study of the relationship between patterns of growth 
^ in professional competence and specified training variables. Preliminary 
analysis suggests that in the absence of special factors , increased indivi- 
dual achievement associated with increased amounts of training will be 
most readily demonstrable in terms of the amount of specialized infor- 
mation the individual can recall, the level of surgical skill he displays 
and the decisevness with which he embarks on a plcin of treatment, and 
that least achievment will be demonstrable in the areas of professional 
habits and attitudes.. The extent to which these general trends are 
modified by variations in the nature of the training program is the 
subject of the forthcoming study. 



Intermediate Outcomes 



In planning the Orthopedic Training Study it was hoped that 
experience with certain of the methodologies developed during the 
course of the investigation could provide a basis for developing a 
generalized model for professional self-study applicable to other 
groups in the health professions. In this context attention should 
therefore be redirected both to the rationale of the study and to 
the more important characteristics of the organizational structure 
evolved for implementing it. 

The approach underlying the study has been explictly treated 
at numerous points throughout this report and particular methodological 
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devcd.opmcnl/.s of general interest arc noted above cunong the lua ter ials 
immediately available from the study. This discussion is therefore 

limited to identification of the following major features of the C*' 
study rationale that appear to be of greatest general applicability; 

(1) The us'- of an empirically derived, behavioral definition of the 
essential components of professional competence to guide every staoc 
of the study; (2) the focus on the nature of the behavior to be sampl- 
ed m the design, scoring, and evaluation of new assessment techniques, 

(3) the provision for systematic feedback of results to members of the 
profession responsible for maintaining standards and for making decisions 
regarding training and certification policies to implement these standards; 
( •) the utilization of reliable and valid assessment not as an end in 
itself nor even as a means of maintaining professional standards, but 
as a prerequisite to sound educational experimentation and as an in- 
dispensable part of a ly educational program. 

.Any intensive professional self-study necessarily entails an inter- 
disciplinary approach; however, though they have led to recommendations 
for modification of educational programs, many such interdisciplinary 
studies have failed to produce any solid long-term accomplishment. it 
is for this. reason if for no other that in developing a generalized model 
for professional self-study itisof considerable importance to consider 
the nature- of the organizational structure required both to promote 
effective utilization of. the various types of expertise needed during 
-he course. of the investigation and to facilitate the ultimate transfer 
of responsibility for implementing the findings of the study, from a 
specially appointed research team to the regularly constituted iDolicy- 
making. bodies of the profession. Experience suggests that the following 
pre-existing conditions were especially favorable in the present study: 



U) 



(2) 



in the orthop ed ic profession vis- a vis 
■ :LA S sues . Long before the initiation of the study 

the leadership in the orthopedic profession had evidenced con- 
tinued concern about problems of education and evaluation; 
for. example, it was the first medical specialty to introduce 
an in-training examination to assist in' monitoring resident 
progress; it had. previously sought guidance from educational 
specialists and it was a request from the Board for a review • 
of its certification procedures that stimulated the conversations 
eventuating an the research proposal represented by this study. 

g a large number of orthopedists in the 

— CQ^A£ A. 5 ?fltion p ro cedur e s o f the Board . For a number 

of years. the Board had made it a regular practice to utilize ’ 
the services of over 200 senior members of tne specialty (including 
virtually all of those with major responsibilities for trainir 
programs) in developing written examination materials and adi " 
istering oral examinations; during this period the Board had 
developed regular procedures for recruiting and orienting new 
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(3) 



(4) 



cxMiii.r nets , for consul ting wi 1.1) both thorn and or.md i.da tor.; about 
the con due: L of the; oxsmi nal..i ons ru:d methods of :i.jiiprov 1 ncj them, 
and for feeding hade to thin cadre information about the results 
of the: exam in a tion . 

Pro v ion s_ ex pc;ri onc e _ of t he C enter in _ in_L cr cj i s ci pi i n a r y r e s e a r c:h i n 
m ed ical educa tio n . Prior to the present study all members of 
the Center staff had had long experience in intcrdescri.pl inary 
research in education and many of them specifically in medical 
education; the basic research staff itself included specialists 
from both medicine and education. Both of those circumstances 
greatly mitigated communication difficulties often encountered 
in interdisciplinary research. 

The methods ev o Ived fo r conduct i ng this stud y . P r ora its ex c ept i o n 
the study was viewed as a joint undertaking of the Board and the 
Center; the research proposal was- cooperatively developed; provision 
was made from the outset for periodic joint review and planning, 
for allocation of specif iced responsibilities between the Board 
and the Center, for budgeting to include orthopedists recruited by 
by the Board both as a part of the; regular research staff and 
as full-time consultants on special aspects of the study for 
periods ranging from a few days to several weeks, for travel 
and meeting funds to support the efforts of numerous Task Forces 
a PP°inted by the Board to work with the research staff on 
specific problems* for regular communications on the nature 
and progress of the study to the profession at large either 
directly from the Board or via .it*. from the research staff and, 
finally, for the Board to take over the implementation of new 
policies and procedures once they had been developed to an 
operational level 3:)y the research staff.* 



That this transfer of responsibility has occurred is evidenced by 
the fact that either directly or indirectly the study has influenced 
introduction of, or planning for, the following modifications in Board 
certifying policies or procedures: 



( 1 ) In accord with, the components of competence defined in the 
critical incident study the Board has established an examination blue- 
print specifying the cognitive skill, attitudinal processes and the 
subject matter content to be evaluated, and the weight to be assigned 
each in the certifying examination; this blueprint has been adopted for 
use in defining the specifications for all examinations under the Board's 
juristiction . 

2. The Examination Committee of the Board has established regular 
procedures for maintaining and. updating the classification of materials 
in the examination pool in accord with the categories of the blueprint. 






; 






See Appendix 29 for a detailed listing of joint activities of the Center 
and Board of a special nature. 
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(3) The Board has os labli-shcd r ogn] a i ] y cons U i u l od task forces 
choree;: I w.i t]i r espon.sib.i X.i.i.y for clove:.! op.i ny, rev.i c:w.i ny, refining and up- 
gr ciding materia] s for the multiple- choice, written si iuu] at ion and ora] 
components of the: exam] na lion . 

(1) r j’ho Hoard has developed a greatly improved system of preparing 
and reviewing written examinati ons for certification to assure conformity 
to specifications of the blueprint. 

C ; ) The total certifying process has been redesigned to yield 
evidence on the following aspects of professional competence: 



I Surgical skill 

II Professional habits and attitudes 
III Ability to recall information 
IV A3D.ili.ty to interpret and analyze data 
V Ability to solve problems (including clinical judgment) 

VI Ability to relate effectively to patients and colleagues 

lwidence on Factors 1 and II is to be gathered primarily by use of question- 
naires and rating scales developed during the present study. The Multiple 
Clioi.ce Examination has been redesigned to yield evidence on Factors IV and 
V, as well as Factor III. Written simulations (patient Management Problems) 
are to be utilized ns a regular part of the certifying examination to yield 
evidence on Factor V (and where relevant, on Factor IV) . The Oral Exam- 
ination has been re-designed to include three half-hour examinations -dc(3 
signed to assess interpretive skill and one half-hour examination designed 
to assess skill in relating to patients (simulated physician-patient 
encounters). Each component of the oral examination is administered by 
trained examiners utilizing previously prepared standardized case 
materials and is scored on standard, objective rating forms. 



(6) The previous scoring system, in which each examination was treated 
independently, has been replaced by a profile of performance in which 
evidence from several sources is combined to yield the most reliable 
assessment of each factor. 

(7) A program for training examiners in the development of standardized 
materials and in the administration and scoring of oral examinations has 
been instituted. 

(8) Provision has been made for the establishment and up-dating of 
a data bank, to contain all "bits" of information which the certification 
process yields. 

9. The Board has established its own office of Education and Evaluatioi 
staffed with a full-time director to implement these revised policies and 
procedures. 
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CJJA1TOK 

A LOOK TO TDJ-; FUTURE 



llie summary of current status suggests the logical next *si-r>n o i-„ 

be t.kor, :u ? the immediate future. In addition to the proved ons‘ for 
continue d umprovorncnt in the regular err-i-i • vj.sj.onf. 101 

Board and the further « naly8is ‘c auL?o°l^^iLf Ltu 

on t-lie lolotionslaps between aptitude and achievement and between" 
achievement pattern* and training variable* discussed previous^ two 
I xxfrc extensions of the present research are clearly indicated 

iPSSSS&lfflg - «rWA5ffl3Sr^S.**.!~"' 

A-Py.o posocl Ten Year P oll ow-lTpjg*-,,^* 

atioufoCe^M 1 "^ 1 ®!: plan of t, * e Orthopedic Training Study it was 

ing for "cc-rti'fient •° n '^ eU ^ Eol]ow " l, P wou:l - d bo made of candidates apply- 

tbf difforencef if dw * n 8 the pe f lod of the research to determine 
dJ ilttcnces, , if any, in the quality of health care delivered u v 

successful and unsuccessful candidates. The detailed proposal forhuchO 

sss?tr!^«^s , ,rr , !r bo drm 

Drior 1-n i-hn T ■ 1 r \ cerilllcaL:, - on in the three years just 

follow i rui c ln f tx J ltlon of the study and in the three years immediately 
suggested h 7 ^ ° f the 1 ” W -ttifying procedures. U i s 
collected v l™ . r C 8araples specified performance data be 

and that on various certifvincr inei-riimonfo Cll „i . . ; j-trioimance 

rori-'-t-F f 1 bl : information not only on the predictive validity of the 

.. ymg process, but also on the changes over time in patterns of 
compe-ence of significance in improving current residency training. 

~Foi a detailed outline of this study see Appendix 30 




Eclu c. n_L ;i_on ;; 1 Tin pr o v twon t : St: ltd j c* R 

The rc'l a tod interests of 1:1k* Amorj can board of Or thopnodj c Surgery , 
the Mu 5j c 11 J. c > •• Site J. e 1. a 1 Committee (KKC-HAS) mid Idle Center for the 
Study of Medical Education are joined in an experimental study of 
educational innovation building on and extending the current investigation 
in order to achieve the following broad objectives: 



1. To provide a model of individualized graduate education in 
medicine in which the demonstration of individual competence, rather 
than the fulfillment of rigid time and content requirements, marks 
the end point of formal training. 



2. To document the nature and variations of orthopedic training 
in the United States. 

3. To devise and test methods for increasing the efficiency and 
effectiveness of orthopedic training. 

4. To determine the relationships between input, training and 
output variables. 



5. To develop mechanisms that will facilitate continuing insti- 
tutional self-study of training programs. 

6. To develop a pool of educational specialists in orthopedics 
who can provide continuing leadership in Ui’ie field. 



In the joint study initiated in July, 1968, it is proposed to 
accomplish these objectives in a three-stage study devoted first to 
an intensive investigation of the nature of current training experience, 
second to controlled experimental modification in educational acitivities 
in selected training programs, and finally to an analysis of the inter- 
relations among input, output and training variables. 



In the first stage of the study data about each program in both 
the- control and experimental groups will be collected with regard to the 
following: 



1. Program organization- -including schedule of resident rotation, 
the personnel who supervise training, the facilities and resources to 
support the training. 

2. Program objectives-- the mechanism of their establishment, 
review and communication to staff and residents. 
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. Program oporatj on- -rcl: ivilui i/s and responfui bll it - ics of a 

resident Sample the nature of inn true t:I onal procedures both forma] 

iTw <:K ' 1 " :,lurc ' of 1:0 resident o£ tlioir individual 

«lj. engUif. find wmiknesses as training pro-- ressea . ^ 

f i . /f : ^ r °8 r ? i i «v«Xi. n (:if.n-tlie medians nius employed to accumulate 

i ' f u ' s . J ‘ K,uL l«egrc»« and program offeotivonosa, and the utilization 

of 1 lie; £» c* data in cont 3. ruling program review, 

5 ' Prn S^om percept i on s - - i d en 1 1 £ i c a l: i on of similarities and 
differences among residents and staff in the perception of 
pm poses , procedures and effectiveness. 

It is anticipated that such intensive review will quickly identify 
areas in which new organisation of training systems, or utilization of 
alternative instructional modes would predictably, increase either 

thScltnald ? r * ££oo f ) lvt,ne8S ° £ training. In the experimental institutions 
L lo would lead in the second stage of the study to introduction of 

specific instructional innovations as well as to more fundamental modi- 
fications m program organization and in those more subtle and pervasive 

factors of staff- trainee interactions that influence the basic climate 
or learning * 

As changes both in specific methodology and in the general climate 
for learning are introduced their effect upon resident achievement will 
be assessed through both cross sectional and longitudinal studies to C) 
which the third stage of the study will be increasingly devoted. 

4ftf * 

The research outlined above is a direct outgrowth of the current- 
study. It is with the view of the future provided by the initiation 

^ study in July, 1968, that the report of the first Orthopaedic 
iiaining Study is most fittingly terminated. 
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Appendix 1 



MATERIALS FOR T1IE EVALUATION OF PERFORMANCE 



m 
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MEDICINE 




DETERMINATION OF OBJECTIVES 




prepared by 

f 

The Evaluation Unit 

Center for the Study of Medical Education 
University of Illinois, College of Medicine 
January, 1967. 
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ORTHOPAEDIC TRAINING STUDY 
AMERICAN BOARD OF ORTHOPAEDIC SURGERY 



AND 



CENTER FOR THE STUDY OF MEDICAL EDUCATION 
UNIVERSITY OF ILLINOIS 



Critical Performance Requirements for Orthopaedic Surgeons 
(derived from The 1964 Critical Incident Study) 



I. Skill in Gathering Clinical Information 
A. Eliciting Historical Information 

1, Obtaining adequate information from the patient 



2. Consulting other physicians 

3 . Checking other sources 



B. Obtaining Information by Physical Examination 



1. Performing thorough general examination 

2. Performing relevant orthopedic checks 









II. Effectiveness in Using Special Diagnostic Methods 



A. Obtaining and Interpreting X-rays 

1. Directing or ordering appropriate films 

2. Obtaining unusual, additional or repeated films 

3 . Rendering complete and accurate interpretation 

B. Obtaining Additional Information by Other Means 

1. Obtaining biopsy specimen 

2. Obtaining other. laboratory data 



IB. Competence in Developing a Diagnosis 

A. Approaching Diagnosis Objectively 

1. Double -checking stated or referral diagnosis 

2 . Persisting to establish definitive diagnosis 

3 . Avoiding prejudicial analysis 

B. Recognizing Condition 



1. Recognizing primary disorder 

2 . Recognizing underlying or associated problem 



o 
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IV, Judgment in Deciding on Appropriate Care 
(*} A. Adapting Treatment to the Individual Case 

1. Initiating suitable treatment lor condition 

2. Treating with regard to special needs 

3. Treating with regard to age and general health 

4. Attending to contraindications 

5. Applying adequate regimen lor multiple disorders 

6. Inventing, adopting, applying new techniques 

•r 

B. ‘ Determining Extent and Immediacy of Therapy Needs 

1. Choosing wisely between simple and radical approach 

2. Delaying therapy until diagnosis better established 

3. Testing milder treatment first 

4. Undertaking immediate treatment 

C. Obtaining Consultation on Proposed Treatment 

4 

1. Asking for opinions 

2. Incorporating suggestions 

V. Judgment and Skill in Implementing Treatment 

A. Planning the Operation 

1. Reviewing literature, X-rays, other material 
^ 2. Planning approach and procedures 

B. Making Necessary Preparations for Operating 

1. Preparing and checking patient 

2. Readying staff, operating room, supplies 

C. Performing the Operation 

1. Asking for confirmation of involved area 

2 . Knowing and observing anatomical principles 

3 . Using correct surgical procedures 

4 . Demonstrating dexterity or skill 

5 . Taking proper precautions 

6 . Attending to details 

7 . Persisting for maximum result 

D. Modifying Operative Plans According to Situation 

1. Deviating from pre-planned procedures 

2 . Improvising with implements and materials 

3 . Terminating operation when danger in continuing 

O ’ 



- 4 - 






E. Handling Operative Complications 

1. Recognizing complications 

2. Treating complications promptly aqd effectively 



r 



F. Instituting a Non.- Operative Therapy Program 

1 . Using approj)riate methods and devices 

2. Applying methods and devices correctly 



VI. Effectiveness in Treating Emergency Patients 

A. Handling Patient 

1. Properly applying splints and other protective measures 

2. Handling and transporting carefully 

B. Performing Emergency Treatment 

1. Determining location and extent of injuries 

2. Attending immediately to lifesaving procedures 

3. Treating most critical needs first 

4. Obtaining and organizing lip Ip 

VII. Competence in Providing Continuing Care 

< 

^ A. Paying Attention Post -Operatively 

^ Administering suitable post-operative care 

2. Recognizing post-operative complications 

3. Adequately treating post -operative complications 

B. Monitoring Patient's Progress 

1. Checking on effectiveness of therapy 

2. Reassessing’, altering or repeating treatment 

C. Providing Long-Term Care 

1. Arranging for rehabilitative care, socio-economic assistance 

2. Explaining and monitoring home and rehabilitative care 

Vm. Effectiveness of Physician- Patient Relationship 
A. Showing Concern and Consideration 

1. Talcing personal interest 

2. Acting in discreet, tactful, dignified manner 

3. Avoiding needless alarm, discomfort, or embarrassment 

4. Speaking honestly to patient and family 

5. Persuading patient to undertake needed care, or only needed care 
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^ B. Relieving Anxiety of Patient and Family 



1. Reassuring, supporting* or calming 

Explaining condition, treatment, jorognosis or complication 



IX 



Accepting Responsibj lilies of a Physician 

A. Accepting Responsibility for Welfare of Patient 

1. Heeding the call for help 

2. Devoting necessary time and effort 

3. Meeting commitments 

4 . Insisting on primacy of patient welfare 

5. Delegating responsibilities wisely 

6. Adequately supervising residents and other staff 

B. Recognizing Professional Capabilities and Limitations 

1. Doing only what experience permits 

2. Asking for help, advice or consultation 

3. Following instructions and advice 

4. Showing conviction and decisiveness 

5. Accepting responsibility fox* own errors 

6. Referring cases to other orthopedists and facilities 

cf Relating Effectively to Other Medical Persons 

1. Supporting the actions cf other physicians 

2. Maintaining open and honest communication 

3. Helping other physicians 

4. Relating in discreet, tactful manner 

5. Respecting other physician’s responsibility to his patient 

D. Displaying General Medical Competence 

1. Detecting, diagnosing, (treating) non-orthopedic disorders 
<2. Obtaining appropriate referrals 

3. Preventing infection in hospital patients 

4. Effectively keeping and following records 

E. Manifesting Teaching, Intellectual and Scholarly Attitudes 
1. Lecturing effectively 

\ Guiding and supporting less experienced orthopedists 

3. Jmc our aging and contributing to fruitful discussion 

4. Contributing to medical knowledge 

5. Developing own medical knowledge and skills 
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F. Accepting General Posponcibilities to Profession and Community 



1. Serving the profession 

2. Serving the community 

3. Maintaining personal and intellectual integrity 



A T axonomy of int-.ol .1 r c t u a l_kro c c s so c * 



The hierarchical ordering of this taxonomy is intended to 
imply chat a certain degree of achievement at one or more of the 
lower levels is a necessary, though not a sufficient, condition 
for success at any higher level. it therefore follows that exam- 
inations which contain a large number of questions at the higher 
levels do not minimize the fundamental importance of basic infor- 
mation. Indeed they stress it, by assuring that the candidate can- 
not pass the examination unless he has the information at his com- 
mand, and understands it sufficiently to make use of it in solving 
new problems. 



LEVEL 1: RECALL 



of isolated informat ion 

Ihis will include recognition of typical morphologic lesions and 
questions about specific facts, concepts, principles, processes, 
and theories. Whether or not it is explicit] y so formulated, such 
a question will ordinarily be asking: "What is X?' r f 



and 

testing recognition of MEANING or implication. Such items 
will require the student to provide something more than a textbook 
or classroom answer, but do not require any significant degree of 
interpretation or application. Items of this type differ from those 
described in the above in that they will ask, for example, NOT "What 
is a blood test or what does it do?" but instead will ask: "Since 

a blood test does X, what does this mean that you can learn from it 
about Y?" 

Illustration: The tensor fascia ,iata muscle is innervated by the: 

a. inferior gluteal 

b. femoral 

c. superior gluteal 

d. obturator 



* Prepared by the Committee on Student Appraisal, University of 
Illinois, College of Medicine. 





LEVEL 2: 



GENERALIZATION OR EXPLANATION 



) 



S..9 ^. illBHiiL t h o s tu d ent to s o 3 c c L; ci roj . ov ant G E TTIX RALIXATTON 
to explain specific phenomena . Ordinarily items of this l.ypc will 
differ from Level 1 items in askiny "Why is X true?" or "How do 

you explain X?" rather than in asking "is x true?" or "What does X 
mean or imply?" 



Illustration: The single most important step in the surgi- 

cal technique of amputation is: 

a. suturing of the fascia 

b. bring a muscle pad over the bone end 

c. accurate proportions between the anterior 
and posterior flap design 

careful hemostasis and if necessary a drain 
for a short period postoperatively 
e. accurate approximation of the flap edges 

LEVEL 3: PROBLEM-SOLVING OF A FAMILIAR TYPE 

the s tudent to make SIMPLE INTERPRETATIONS of DA TA . 

Items of this type require the student to translate verbal, tabular", 
morphologic, or graphic data into another form (i.e., to read the 
data), or to make. interpolations or extrapolations from the data. 



and 

, r . ^ SRA^i r ig’, bhe student to APPLY a sing le principle or A STANDARD 

JgOMDlNAT^ to a situation of a familiar t yp^Tln 

items of this type, while the specific content of the problem wiJ 1 

be new to the student, the problem will involve a familiar pattern 
of attack. 

illustration: A 21 year old white male is involved in an auto 

•accident sustaining a laceration of the face 
and an obvious closed mid third fracture of the 
left femur. The patient complains of chest pain 
and is short of breath. B/P 100/90, R 30, Pulse 100 

What roentgenogram should be obtained? The 
patient's general condition is adequate to permit 
all indicated films to be made 

a. skull films and x-rays of left femoral shaf A 



b, 

c . 

cl. 



loft femoral shaft, left hip in AP viev/, chest 
left femora.! shaft and chest 
AP and lateral viev/ of chest 




require the student to . w constituent eldT 
t ion ships in a set of data, to judge their internal c 
to comprehend the organizational principles involved. 



Items requiring the ANAL” " " ‘ Items of this 




Items of this type will 
.. v- . — constituent elements and rela- 
te judge their internal consistency and 



and 



Items requiring the student to APP LY A U NIQUE C OMBINATION OF PRIN- 
CIPLES to solve a problem of a novel ty pe. In items of this type, 
both the specific content and the character of the problem will be 



. book problems. 

Illustration: (Note: The following item was preceded by a 



On the basis of this additional information, which of the fol- 
lowing measures might provide the most useful information? 

a. an electromyographic study of the abdominal musculature 
right and left 

b. spinal tap with spinal fluid analysis 

c. a complete blood count 

d. additional x-ray studies of the thoracolumbar region 

e. an erythrocyte sedimentation rate 



Items requiring the EVALUATION of. a total situa tion. Items of this 
type may be based on a case report of the type prepared for the typi- 
cal clinical-pathological conference, or a research report, or the 



new to the student to the extent that solution will require a novel 
pattern of attack, not previously illustrated jn classroom or text- 



description of the 'presenting complaint, a brief 
history, a few questions, and additional data, 
about the course of the disease process over the 
next two weeks . ) 



LEVEL 5 : EVALUATION 



presentation of a theory together with evidence, and will require 
the student to evaluate the total presentation. 





Illustration : 



(Noto: The* following quo si: ion was preceded 

by a description of the presenting complaint, 
a brief hi. story, a few questions and then 
several sets of additional data including in- 
formation on subsequent course.) 



In the face of the present circumstances, which of the follow- 
ing procedures seems the most logical at this juncture? 

a. immediate myelographic study 

b. enforced recumbency on a turning frame with cephalopelvic 
traction 

c. immediate laminectomy and decompression 

d. e le c tromyc log r aph i c study of the lower extremity musculature 
to determine the precise level of involvement 



LEVEL 6: SYNTHESIS 



Items requiring SYNTHESIS of a variety of elements of k no wledge into 
an original and meaningful whole . Items of this type may be based 
on a clinical report which requires the student to develop a differ- 
ential diagnosis or a therapeutic regimen. Alternatively, such 
questions may be based on a set of data which require the student to 
develop an original (to him) theory explaining the phenomena. Such 
items will involve th 2 process of working with concepts and principles, 
and arranging and combining them in such a way as to constitute a 
pattern or structure not clearly there before. 

Illustration: None 
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Ta si 

Procedure 



Preparation: 



Appendix 3 

PAJ/J'JR3: TASK FORCE ON WRITTEN EXAMINATIONS 



: To determine whet kinds of competence are being 

measured in the written examinations currently 



m use . 



: To classify each question in the January, 1964, 

Part II and May, 1964, Part 1 examinations, 
according to the kind of intel lectual proce ss 
the candidate is most likely to employ in 
answering the question. 

( 

In advance of the meeting on December 27, each 
member of the study group should study the attached 
documents, and make a tentative classification of 
the questions appearing on pp. 12 to 19, and be 

prepared to suggest needed changes in the classifi- 
cation system. 






DETERMINING WHAT A TEST MEASURE! 



•We know from pro vi cue research on examinations, that some 
questions in a test can be answered by immediate rec all of information. 
Others require the candidate to reason out the answer. It may, of 
course, be impossible to answer questions of this latter type unless 
one can recall certain basic information assumed by the questions, but 
the significant distinction is that it is impossible to answer the latte 
typo of question exc 1 u s i yel_y on the basis of information recalled. The 
candidate must be able to go well beyond a process which relies on 



rote memory, in using the information at his command, to reason out the 
answer . 

Among the questions which require the candidate to reason out the 
answer, the kind of reasoning process involved will vary with different 
kinds of questions. Once it is clear that a question cannot be answered 
from rote memory alone, then it is necessary to take a second step: i.e 

to decide what kind o f m enl^aljgrqc ess one would ordinarily go through 
in order to answer the question. For example: Does it require the 

examinee to apply principles? To evaluate data? To analyze a problem? 
The following sections of this document list the various such processes 
to be used in the classification of the Board Examinations, together 
with illustrations of each category and a definition sufficient for the 
purposes of this classification. Mo attempt has been made to define > 






1. In deciding first whether or not a question can be answered on 



the basis of immediate recognition and recall, it is necessary 



to look at the incorrect, as well as the correct, answers. By 
a process of elimination a candidate may be able to answer an 
apparently thought- testing question on the basis of recall only 
by excluding obviously wrong answers (and thereby coming to the 



recalls that all the others are wrong. 

2. In deciding whether or not a question can be answered on the 
basis of recall, it is necessary to consider both the average 
training program, standard references, typical experience and 
such other aids to learning as are normally available to a candi 
date. A man may "come" to a most impressive conclusion, but it 
may represent only the most superficial recall if it is of the 
type that he would ordinarily memorize from the standard refer- 
ences employed in his field. 

3 . In deciding whether or not a question can be answered on the 



correct answer without knowing or reasoning it out) ; he merely 



basis of recall, it is necessary to try to imagine how the 




♦ # 

candidate would approach the question in an examination situa 
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tixo?i * For example , items that look like sj.mplo informational 
questions may actually be ones that candidates character! sties]! 
re a son ou t < . V7ith the amount of knowledge it is necessary to 
acquire in this field/ it may be that physicians character is t id- 
eally develop a system of thinking • about an area which enables 
them to reconstruct the details through a reasoning/ not a 
memory/ process. Alternatively, it should be noted that ques- 
trons related to case materials may involve simple reca.ll and 
no reasoning, if the case description is so cut-and-dried 
that it represents a classical textbook description of symptoms 
or if some of the questions can be answered without specific 
reference to the case material. 

‘To repeat: For purposes of this study, it is necessary 

to try to classify each question on the basis of how a candi- 
date for cei tif ication would approach it in an exa mination 
situation . 



4. 



You will note that the following classification (See Appendix 
2) is arranged in an hierarchical system. If an item involves 
two or more levels (for instance, both recall and application) 
it should be classified at the highest level necessary to use 



in answering it. 
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Xn this. column record time at vhioh each near topic 
*?&s dr- 1 1 o-.ii* cod 5 in ao.rlj.d5.oa, record time of specific 
questions an frequently ao po.cn 5. oils,, 



Whenever vicucl irrceriol is used us. the basis for a 
question, record the c DlGl-Olf (nop key at 

‘cotton of fore.) „ leava blank :/he.u no visual ma*> 
t oriel in introduced as a part of the question P 



Xtcoord B/iOH question cr.l.th sufficient specificity to 
give t. reader a clear 5 /r.f icatic-r. of the nature of 
the question., (Each question in a series should ho 
iVidi'vldii.nLly recorded and coded*) 



Check OH?, of Iho columns .labelled "Recall, M "Problem-* 
Solvi'.'.cq ' ! 1: 7nk ei-pr ct*i ve Skill*’ to indicate the 

PBJITD 0-! f tliAIi *3? nature of the process revealed by the 
candidate 1 * s response o In general, only one of 
these three eoluars should be checked^. ] lor ever , 
if you are in real doubt put a. question mxk in a 
second colurn., When there is evidence- thifc the 
candidate is guessteg or doesuH toon the aesver, 
place a check, in the appropriate column* r Xhe ’’Guess 9 
and/or: ' , Poesti' ! b Ihoa^’ccluirris may be checked v/he-ther 
or not there is a check in a Recall . "Preble)!** 
Solving, " or interpretive Skill” coiuoui. If some 
other: process is involved, note in the Comment column, 
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Check toils colu:x. vhr never the examiner provides 
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Chock '(raorf 1 only oh on. the examiner rephrases his 
qiiesfio.a^ offers hints or fluggesbiciis or asks lend- 
inc questions to assist the candidate ; do U02 use 
it whon^tho e:awlnor nerely says, tf tfhjv& else?” to 
moon; "L’o you have anything to add to y<: ur ansuer?” 

He cord any A-X0mtZiA> information that will assist. 

.In deter/iiiving the nature of the examination or 
the setting* For example, types of ctinralua ma- 
terial , responses or feedback not specifically 
provided for in the columns , distractions or , 
assists from the second examiner, etc. Since the 
verbal colu.u has been omitted from the stimulus 
material, also note irt. the C eminent colurm any ques- 
tion that is based on a concrete clinical problem 
apout a specific patient. HO i;Ct‘ so classify 
clinically oriented problems of a general nature# 



At the end of a group of questions, dr a nf a iravy line to separate questions 
dealing with different topics or cases# 






At the conclusion of the examination) record in the appropriate box at the 
end of the form, any continents of the examiners, your own summary notes and 
the candidate’s grades* 
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Appowtix 5 



res i dent jwalimviom row 



jo not write 

‘N THIS S RACE 



Col. No. 
1-3 
4 - 6 
7-9 



Col. 

No. 



10 



11-12 






Name of Resident 



ldontif.icati.on No. tZ3 [LjLJ 

Institution 




Name of Rater 



coao □ n t: 

code □ nr 



In filling out this form you are to rank the resident on each 
factor in terms of all the residents in orthopaedic surgery you 
have known during your career. You are to indicate your rankings 
by checking the appropriate box under each factor. In making these 
evaluations DO NOT take into account the resident's level of train- 
ing. For example, -a second year resident may have the potentiality 
to display outstanding surgical skills, but many fourth year* resi- 
dents might function AT THE PRESENT time at a higher level. He 
should be ranked lower than they 'are ranked on surgical skill. If 
you believe that you do not have sufficient information on the 
resident to evaluate a particular factor, check the appropriate 
box. Please write your name in the space above. All the infor- 
mation collected will be held strictly confidential and will not 
be used for any purpose other than research purposes. ^ 



Factor I: 



Ability to recall factual infor m ation c oncerning 
general medicine and orthopaed i c surgery 



This factor deals with the resident's command of the 
factual information required of a practicing ortho- 
paedist. Residents who score high are those who have 
a - great deal of pertinent information at their "finger- 
tips." Residents who score low are those who consistently 
display wide gaps in their knowledge. Residents can score 
well on this factor and lov; on Factor II below. They may 
recall a great deal of information, but have difficulty in' 
integrating the information in solving problems in patient 
treatment and care. ^ 



I do not have sufficient information to judge. 



RANKING 



□ □□ □□□ □□□ 



01 02 
Lowest 
quarter 



03 



04 05 

Third 

quarter 



07 08 09 

Second 

quarter 



N 




Col . 

No. 

♦ 



3.3 



14-16 



Col. 

No. 



O 



17-18 



Col . 
No. 



19 




182 



Factor 11: /'il'J.ljiiV.. to solve j)rob3 cmr; 

This factor deals with the resident's cf fee hi von ess in 
using the information ho has co.l 3. acted and recalled in 
solving problems in treatment and diagnosis. 

_1 

. I do not have sufficient information to judge. j j 



□ □ □ 

01 02 03 

Lowest 

quarter 



JRANIONG 

□ □ [ 

04 05 06 

Third 

quarter 



□ □ □ 

07 08 09 

Second ' 

quarter 



□ □ □ 

10 11 12 

Highest 

quarter 



Factor Ills Ability to gather clinical information 

This factor deals with the resident's effectiveness in 
gathering clinical information. is he generally thorough 
and discriminating , or does he fail to gather important 
information and in general is haphazard and inefficient 
in this factor? 

1 

X do not have sufficient information to judge. 1 I 

J J L~J 



□ r~) I — *] 

01 02 03 

Lowest 

quarter 



R ANKING 

□ p*"i 

04 05 06 

Third 

quarter 



□ □ □ 

07 08 09 

Second 

quarter 



□ * «*• »*«% 
10 11 12 

Highest 

quarter 



.) 



Factor IV: Judgment in deciding on appropriate treatment and care 

* 

This factor deals with the resident's ability to properly 
weigh the many factors involved in deciding on treatment 
and care# and to come to sound conclusions. 

1 

I do not have sufficient information to judge. □ 



□ □ □ 

01 02 03 

Lowest 

quarter 



RANKING 

IW.. 1 

04 05 06 

Third 

quarter 



□ □ □ 

07 08 09 

Second 

quarter 



□ □ □ 

10 11 12 

Highest 

quarter 



y 




C’ol . 
No . 



22 



23-24 

Col . 
No. 

25 

26-27 



Col . 
No. 



28 



29-30 



■J b j 



1’cir l:o). V ; Skill j n s mgiea 1 < _lJroo ; ' , c1nr cr. 

This factor deals with the* resident's manipulative 
skill in carrying out. the procedures required of 
orthopaedists . 

1 

I do not have sufficient information to judge. I I 



□ □ □ 

01 02 03 



Lowest 

quarter 



PWKG 

□ □ □ 

04 05 06 

Th i rd 

quarter 



□ □ □ 

07 08 09 

Second 

quarter 



10 11 12 

Highest 

quarter 



P a e t or VI: . R elating effectively to patients 

< 

This factor deals with the resident's tact, consider' 
ation and skill in dealing with patients. 

1 

I do not have sufficient information to judge. PI 

J J I — 1 



□ □ 

01 02 03 

Lowest 

quarter 



RANK ING 

04 05 06 

Third 

quarter 



□ f' mmm+ l Ivhmmm 

07 08 09 

Second 

quarter 



□ LJ □ 
10 11 12 
Highest 
quarter 



Factor VII: Relati ng ef fectively to c o lleagues and other 

medical personnel 

This factor deals with how effectively the physician 
works as a member of a medical team, in asking advice 
giving advice and showing tact and consideration. 

1 

I do not have sufficient information to judge. PI 

Li 



RANKING 



□ □ □ 

01 02 03 

Lowest 

quarter 



□ □□ □□□ □□□ 



04 05 06 

Third 
quarter 



07 08 09 

Second 

quarter 



10 11 12 

Highest 

quarter 



O 



•mu 



mem 
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Factor V J 'f I 



Col . 
No . 



31 



32-33 



Col. 

No. 



34 



O 



35-36 



37-40 



j. )o; .v r t nr t ^at i no li-ora.l and ethicail y , 1 nndnrd: 

j_ c ■_< i oC c i jj] i y ; i.e :i. an 



'i’lnn facl or deals with the resident • s <; tandards in 
terms of his cone er n for pa t i ents , 1 i i o financial, 
dealings, and bin contacts v;j th other physicians and 
society ..in general.. 

1 

T. do not have sufficient information to judge. |~J 



□ □ □ 

01 02 03 

Lowest 

quarter 



RANK INC 

□ □ □ 

04 05 00 

Third 

quarter 



□ [ 

07 08 09 

Second 

quarter 



□ □ □ 

10 11 12 

Highest 

quarter 



Factor IX : Oven a ll, competence as an orth opacdlc su rge o n 



I do not have sufficient information to judge. PI 



□ □ n 

01 02 03 

Lowest 

quarter 



RANKING 

fcji4.nt if 

04 05 06 

Third 

quarter 



□ □ n 

07 08 09 

Second 

quarter 



u □ 

10 11 12 

Highest 

quarter 



Date completed 







t 










.» o:> 



A; •, •* -i.fl I.:; (, 



E/U*mi.. iii;.;i 1,'urii !/•„•; 



llc'yr’r . j | : .« 



Y“ - » 

* *• f . ’ * * 



TO CM! i; i or SL!:VICL* 

PJcci : c cun;M:l< one of fheio {< w..o foi ccJi r.tUenl 
lion one' return liiesc with Ihc* c/ominulion chi$v/*.t 



v/iio win 
M/ ) O ; 



lo/.t- flic* \\67 Orthopaedic In-Trumin^ r.xcnmnci 



An i ci icon Accul.'iiiy of O/ /liopoeclic Surgeons 
29 LuM A 1 . ci c v i:,' n Strict 
Chicago, Illinois 60602 

This infoi motion will bo* used for slolistical purposes only ond will be kept strictly confidential. 



Do Nol . lie 
In This Space 



Col. No. 9-29 
Col. No. 30 



Col, No. 31 



Col. No. 32 



Co!. No. 33 



Col. No. 34 



Col. No. 35 



Col. No. 36 



Col. No. 37 



Col. No, 38 



Col. No. 39 



Coi. No. 40 



Factor 1 



Factor 2i 



Factor 3s 



Factor 4: 



Factor 5i 



Factor 6: 



Factor 7i 



Factor 8i 



Factor 9: 



Factor 10s 



RES IDF NT’S NAME .... 
YEAR in TRAINING . .. 



RANKING 



Lower 
Quot Icr 



lowe» Middle 
Ouorler 



Upper Middle 
Quorler 



KNOWLEDGE OF CIINICAL ORTHOPAEDICS 

□ □ no □ n 

' 2 3 4 5 6 



KNOWLEDGE OF BASIC SCIENCES AS RELATED TO 
ORTHOPAEDICS 



a □ 

1 2 



D □ 

3 4 



□ □ 

5 6 



ABILITY TO GATHER CLINICAL INFORMATION 

□ □ □ □ n 

4 5 6 



3 



Upper 

Quorler 



a a 

7 8 



n n 

7 8 



□ □ 

7 8 



ABILITY TO USE INFORMATION TO SOLVE PROBLEMS 

9 P □ □ d n • n 

1 * o A 5 6 7 



n 

8 



JUDGMENT IN DECIDING APPROPRIATE TREATMENT AND CARE 

9 9 99 99 □ □ 

1 1 3 4 5 6 7 8 

SKILL IN SURGICAL PROCEDURES 

□ □ □ □ 

12 3 4 



□ □ 

5 6 



n n 

7 8 



RELATING EFFECTIVELY to PATIENTS 

° □ □ n □ □ 

*2 3 4 5 6 

RELATING EFFECTIVELY TO COLLEAGUES AND OTHER 
MEDICAL PERSONNEL 

99 99 9 □ 

2 J 4 5 6 



□ □ 

7 8 



□ 



DEMONSTRATING THE MORAL AND ETHICAL STANDARDS 
REQUIRED OF A PHYSICIAN 

a a an □ □ n 

» 2 3 4 5 6 7 



OVER. ALL COMPETENCE AS AN ORTHOPAEDIC RESIDENT 

no n □ g □ □ 

3 4 5 6 7 



1 2 



□ 

8 



a 

8 



□ 

S 



o / 






/ /-/ 7 



Aj.'jH'nci.L': 7 





29 East Madison Street 
Chicago, Illinois C0G02 



CANDIDATE EVALUATION FORM 



INSTRUCTIONS; The physician below has applied for 



entrance to the Certification Examination of the 
American Board of Orthopaedic Surgery. In reviewing 
his application, the Board would like to have some infor- 
mation on his capabilities in each of the areas of compe- 
tence listed on the following pages. For each area, a 
description has been prepared of the effective and ineffec- 
tive physician. Please indicate where you believe the 
candidate fits in this continuum by drawing a vertical 
line across some point on the line below each 
description of the factor. 



DO NOT 
WRITE 
IN THIS 
SPACE 



Name of Candidate 



Last, 



First 



1 - 10 



Identification Number 



11 - 15 



Date form filled in 



Prepared with the assistance of 
Center for the Study of Medical Education 
University of Illinois College of Medicine 




1.80 



DO NOT 
WHITE 

r Yl THIS 
SPACE 



1C - 25 
26 - 29 

30 

31 

32 

33 



34 

Os 

36 

37 

38 

39 

40 



41 




) 

Please fill out llio following information about yourself. 

Name of Hater: _ _ 

Last First " 



Hater's ID No. (To bo filled out by CSME) 

Rater's Institution 



Your relationship to Candidate (Chech as many as apply) 

CD Chief of Training at institution* where he trained 
I I Other full r time orthopaedist at institution where he trained 

□ Other full-time non -orthopaedic physician at institution 
where he trained 

□ Fellow orthopaedic resident at institution where he trained 

CD Fellow non- orthopaedic resident at institution where he trained 
CD Attending orthopaedist at hospital where he trained ) 

CD Attending non- orthopaedic physician at hospital where he trained 
CD Orthopaedic colleague in community where he practices 
CD Non- orthopaedic colleague in community where he practices 

□ Other (Specify) ^ „ 

Period of acquaintanceship with candidate: 



D 0 - 6 mos, 

1 

I — f 

1—1 1 - 3 yrs. 

u 



D 6 ~ 12 mos. 

2 

^3-5 yrs. 



Familiarity with candidate’s practice: 




over 5 yrs. 



CD Not familiar 

1 

□ Moderately 
* familiar 



Cp Slightly familiar 
□ Very familiar 







r* 












Col. No. 
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ItJVr 

Factor 1. IHFOKM VJ'jQN OATUHHIk'O 



'J’his Victor is cui-cc rjic.d witJi llio Candi chile's willingness, ability and 
skill in gathering information necessary for diagnosis, 

The INEFFECTIVE Candidate 
limits liis interview and physi- 
cal examinatio , to the area of 
complaint and fails to pursue 
al tornative hypotheses. 

Ho frequently uses therapy to 
substantiate clinical impres- 
sions. 



n 



The EFFECTIVE Candidate routinely 
takes a comprehensive initial his- 
tory and physical examination, lie 
records the information received in 
a systematic fashion, and pays care- 
ful attention to progress notes. 

He is aware of information other 
than the medical and indicates this 
by initiating further procedures and 
questions. 



42 - 43 




01 1 02 | 03 


_04 


os Joe 


_07 J_08 


09 


10 I 11 I 12 






Poor 


Marginal 


Good 




Excellent 



CD Insufficient information to judge 



O 



Factor 2. , PROBLEM-SO LVING 

This factor is concerned with the Candidate's ability and skill in using 
information gained to develop a diagnosis and support clinical activity. 

The INEFFECTIVE Candidate 



has an incomplete comprehen- 
sion of .the implications of the 
data he has collected. 

He is unable to interpret unex- 
pected results and often ignores 
them. 

He makes decisions on the basis 
of experience, disregarding the 
context in which that experience 
was gained. 

His thinking is rigid and unimagi- 
native, impeding his recognition 
of associated problems. 



The EFFECTIVE Candidate realizes 
the importance of unexpected findings 
and seeks to determine their impli- 
cations. 

He understands the nature of proba- 
bility and uses this to illuminate 
his experience. 

He takes all the data into account 
before reaching a decision, and 
routinely tests alternative hypo- 
theses. 






45 - 46 



01 | 02 | 03 


04 


05 .| 06 


07 


08 I 09 


10 1 


I 11 1 12 


Poor 


Marginal 


Good 


Excellent 



0 Insufficient information to judge 

■ 1 111 wo 



safe 



lERiCl 








100 



Co), No. 



48-49 

6 ° 



% 

51-52 

53 



Factor 3. CL1KJCAL JUDGMENT 

j hiiS fiicloi j.s concei nod v/ilh the Candidate's ability to use sound judgment 
in planning for and carrying out treatment. 

The INEFFECTIVE Candidate is overly The EFFECTIVE Candidate is familiar 



) 



concerned with Iroatjncnl techniques at 
the expense of overall goals. 

He often delegates pro- and post-oper- 
ative care to others. 

He plans treatment without sufficient 
familiarity with the procedures ho 
selects. 

His treatment choice is rigid- -using 
a set formula for treating each clini- 
cal problem or using a favorite tech- 
nique when more effective ones are 
available, 



with the uses and limitations of the pro- 
cedures he attempts. He recognizes his 
own capabilities and uses procedures 
which correspond to them, 

He considers simple procedures first. 

His clinical judgment encompasses infor- 
mation beyond the pathologic. 

He demonstrates regard for patients’ 
needs, desires and life conditions. 

lie is flexible enough to modify his 
treatment plans when the situation 
warrants doing so. 



01 


02 


03 


04 


05 


06 


07 


08 1 09 


10 1 11 1 12 


Poor 


Marginal 


Good 


Excellent 



tj Insufficient information to judge 



Factor 4. SURGICAL TECHNIQUE 

( This factor is concerned with the Candidate’s ability and skill in carrying 
I out operative procedures. 



The INEFFECTIVE Candidate has in- 
sufficient skill for the procedures he 
attempts, 

His overall handling of instruments 
and tissue lacks finesse. 

His operating time is often prolonged 
through unfamiliarity with procedures 
or inadequate planning. 

He takes unnecessary operative risks 
or terminates operation before maxi- 
mum results are achieved. 



The EFFECTIVE Candidate handles tis- 
sues gently, uses careful haemostases, 
and makes a proper and adequate expo- 
sure of the operating field. 

He carefully attends to details such as 
sterilization of instruments and proper 
choice of same. 

He makes proper application of fixation 
devices or prosthesis and makes proper 
closure of wounds. 

He carefully monitors his patient during 
operative procedure. 

He applies appropriate dressings, splin 
and casts. - 



01 I 02 I 03 I 01 1 05 | 06 


07- 


08 


09 


10 I 11 I 12 


Poor [ Marginal 


Good 


Excellent 



□ Insufficient information to judge 



Col. No. 




54-5! 



5G 



57-50 

59 



av'i 

J’.'Tloj 5. 1UUV/J')NG TO PATIENT 

This factor is concerned will) the Cp.ncliclutcf f « effectiveness in working with 
patients. 

Tfio INEFFECTIVE Candidate do os not 
communicate with his patients, oilhor 
through aloofness, indiffei’cnco or the 
pj. assure of time. 



The KFj'TOCTiVU Cumliilulc's mnrC 
elicits patient confidence; and coopera- 
tion and relieves anxiety. 



He has difficulty understanding 
patient needs. 

He is unable to evoke patient confi- 
dence, tending even to alarm thorn. 

He reacts negatively to hostility or 
other emotional displays. 



He is interested in his patient’s well- 
being and demonstrates this without 
becoming emotionally invol/ed. 

He is honest with the patient and his 
family. 

Patients like and readily feel they 
can ask questions and discuss problems 
•\vith him. 



01 1 02 I 03 
Poor 



04 1 0 5 1 0 6 
Marginal 



Good 

□ In sufficient information to judge 



)9 


10 | 11 


12 




Excellent 



Factor 6. CON TIN UING RESPONSIBILI TY 

This factor is concerned with the Candidate’s willingness to accept the 
responsibility for long-term patient care. 



The INEFFECTIVE Candidate either 
loses interest after initial treatment 
or does not take the time for ade- 
quate follow -up. 

He becomes discouraged with slow 
progress and cannot cope' with a poor 
prognosis. He is unable to communi- 
cate realistic expectations to the 
patient. 

His utilization of support personnel 
is either inadequate or he expects 
assistance beyond their capabilities 
and training. 



The EFFECTIVE Candidate is able and 
willing to work with the patient to 
achieve maximum rehabilitation. He mo 
tivates the patient to strive for his own 
rehabilitation. 

He monitors patients’ progress, alter- 
ing therapy or treatment as indicated. 

He understands the roles of various 
allied health professions and makes 
maximum use of their assistance. 

He maintains a positive and persistent 
attitude toward recovery. 



1 01 I 02 1 03 


04 


05 | 06 


07 | 08 


09 


lOjll [12 


1 Poor 


Marginal 


Good 




Excellent 



□ Insufficient information to judge 
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K’T 



-i'. V. HMHkGHKCY C’AUK 



Tub. fi>eU i y i .s coiK'Oi-jjcM.l v.'ilii Dio Candidaio’s ability to act effectively in 
o; :' )-,( my situations, in the* operating theatre or the emergency room, 



) 



'J’l.o EFFECTIVE Candidate panics 
easily ;io! maker; inappropriate use 
of <i i n :* available. 



ID 1 fo'nmc:; confused under pressure 
ai d. Ik-: difficulty establishing prior- 
ities. He is unable to delegate aspects lance of others, 
of cure to others. 



The EFFECTIVE Candidate quickly 
assesses the situation, pays attention 
to lifesaving procedures and demon- 
strates understanding of triage concepts. 

He is aide to obtain and organize as sis - 



Jle is carol ess about applying pro- 
tective measures. 

He is unable to make decisions 
alone. 

01 1 



He is able and willing to make decisions 
alone if necessary. 

He is aware of the consequences of 



n u!y • 



J.0i. 


04 | 


| 05 | 0G 


07 I 08 


1 09 


r 


Marginal 


Good 



Excellent 



Q Insufficient information to judge 




This factor is concerned with the Candidate's ability to work effectively with 
his colleagues and other members of the health team. 



uS 



ficuJfy relating to others and lacks 
the ability either to give or take in- 
struction gracefully. 

He tends to be tactless and incon- 



with whom he works, 



vice, and in an offensive manner, 

lie is unwilling to make referrals or 
seek consultation and fails to support 
his colleagues in their contacts with 
his patients. 



The EFFECTIVE Candida to relates well 
to others and communicates easily, 
working well in a team situation. 



and respects others' views. 

He demonstrates self-control. 

He gives credit to others for their con- 
tributions and creates an atmosphere of 
working together- -not working for. 



03 



04 05 



[, g 



m I op. 1 09 1 xo I ii I n 

I. ; ) - ■» -■ , tv .-. t T - -rr~,. 

Excellent 



i 



Insufficient information to judge 



.f'v.5 



Col. No. 



Factor 0. MON.Vh Als’O ETHIC/* L VALUES 



This lacier i t! concerned with the; CaudkhtoT; alllludes and standards as 
an individual. 






The INEFFECTIVE Candidate 
attempts to cover up his errors, 



'The EFFECTIVE Candidate's conduct 
reflects kindness, respect, honesty 



He is frequently absent from assigned 
duty or unavailable when needed, 



and humility. 



He has unethical contacts with non- 
medical, professions and allows his 
personal finances to unduly influence 
treatment . 



Ho reports facts accurately, including 
his own errors. 



He discusses medical mismanage 



86-67 




me nt w i ill pat J c nl s . 



He respects the confidences of col- 
leagues' and patients. 

He places patient care above personal 
considerations . 



He respects the property of others. 



He recognizes his own professional 
capabilities and limitations. 



02 



Poor 




04 | 05 j 06 
Marginal 



07 I 08 I 09 






10 I 111 J 32 
Hxceilent 



D Insufficient information to judge 



Factor 10. OVERALL COMPETENCE 



This factor is concerned with your judgment of the Candidate’s overall 
competence as an orthopaedic surgeon, taking into account Factors 1 
through 9. 



04 [ 05 j 


06 | 


1 07 j 


08 


09 
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t — 1 1 


It \ 


12 


Margin! 


3.1 j 


Good 




r Excellent j 



Insufficient information to judge 
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Candidate's Examination Number; 



Examiner's Name: 



SfarHng Time: 



{Cob. 1 . 3} 



{Cols. A - 5) 



. 6 - 7 3 
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f actor I: Ability 4:* c!I«*i« ij.i arb-quo/-? 4 or peitfr.: 



mt infchnoikn 



vt 



(The candidate should cA most of the indicotcd question;.; 
other questions should bo appropriate to iiio diagnosis.) 



01 02 03 



04 05 06 

i □ 

Adequate 



0 / 03 09 

□ p □ 



10 11 12 

J □ L 

Excellent 



Factor Ils Ability to communicate with the patient 
Weight 1 

fDirJ he use appropriate vocabulary, use concepts familiar to 
the patient, and allow the patient to narrate parts of the 
history?) 



01 02 03 



05 06 

□ i. 

Adequate 



07 03 09 



Excellent 



Factor Ills fcTfieioncy In gathering data 
Weight 1 



)id he ask relevant and necessary questions, and avoid 
file time waste of exploring remote diagnoses which prevent 
on adequate examination of the pertinent facts?) 



02 03 



07 08 09 



Adequate 



Excellent 




20 .: 



Co!. No, 


flAfJNG Or U]/iGj *103 i 1 J I?jjii;;Vj2W CCon?l*i 

factor IV: 


wed) 


- 




AhM'/ i. arrive at ci diagnosis cn,*J present logical lease 
Weight 4 


-ns for it 






{Dia ho fail to consider a!! the pertinent facts lie uncovered/ 
make errors In relating or Interpreting fads, or make errors 
in weighing the facts at hand?) 




20 - 2 i 


01 02 03 04 05 06 07 08 09 

a □ □ □ □ □ □ □ □ ■ 

Poor Adequate Good 


10 it 12 

□ n □ 

Excellent 






Factor Vs Overall ©valuation cf Diagnostic Interview 
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22 . 23 


01 02 03 04 05 06 07 08 09 

□ a □ □ □ □ □ □ n 

Poor Adequate Good 


10 11 12 
□ □ o 

Excel lent 




24 


'Your role: Q '“patient” [~] rater only 
? 2 










r ; 


23 . 


Comments: 

* , 




v.A; 




The Candidate was difficult to evaluate because: 




»« *&***** *> •*+ 1 ' KMWmif*'. » ■»»« .»«*« 


26 


n He spoke slowly 






27 

I* 


n He spoke rapidly 


-•/- 




28 


□ He did not speak English well 






29 


j~l He seemed excessively nervous 




• : " . 


30 

31 


□ He seemed confused about the procedure 
| j Other 






32 


1 did not find the Candidate difficult to evaluate □ 




y- -'m 

■ . , .• : \r 3^5:'.. ■. 
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33 - 34 



35 - 36 



■T-7 - 38 
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7,3 



imi ;- k* or* phoporou rKSAW/jNi iM'CMVmw 

Proposed Treatment Case Me. 

factor I; fifroUi/eness of She ccmdldjb h; s>Sci*D3<7C- r*5i> 



Weight 6 



ty 



(Did }jo give too little information, over simplify, incJicai 
undue pessimism or optimism, overwhelm the patient with 
excessive detail or use inappropriate vocabulary?) 



01 02 03 04 05 06 07 08 0 ? 

□ □ □ □ □ □ □; r; 

Poor Adequate Good 



11 12 

□ □ 

Excellent 



factor !h Ef<*octivonbfr$ of i he candl date’s manner 
Weight 2 

(Was the manner in which the physician dealt with the “pa- 
tient” one which would genuinely convince the patient that 
the physician is Interested In his welfare?) 



01 02 03 



04 05 06 

3 □ C 

Adequate 



07 03 09 



D □ □ 

Excellent 



Factor III: Efficiency of the interview In terms of the interaction between 
patient and physician 



the physician present the required information to the 
patient in a clear-cut efficient fashion?) 



02 03 

o □ 



04 05 06 

□ #mmm « 

Up W. wJ 

Adequate 



07 08 0? 



11 12 






OOP 

Excellent 



Factor IV: Overall evaluation of the Proposed Treatment Interview 



01 02 03 
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04 05 06 

□ i — i n 

U hn mJ U» tJ 

Adequate 



07 08 09 

□ □ □ 



□ □ □ 

Excellent 
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Starting Time 
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■jscmption ov levels 



OF PISUf ’OXZMiM IC3S 



A ■ 



MiiiQJj Hooerve this rating for errors of fact or concept that can best be 
judged primarily in terew of content. 






*• SMSSEiSSSyP: Distrsicto group with Inaj^Mrtate, irrelevant or illogical 
or&tcotlons, prc.vc;:o3 grot? with antagonistic, argumentative 0 
staiementa tint reflect or Invite hostility; monopolizes discussion 
with repetitive leelotonce on own point ot view. 






ILl!^ pa.oo.ive acceptance or rejection; 
concurs, coracles/ ISroly repeats or ratifies suggestions of 
others, mhos personal, private, idiosyncratic comments or 
oispreooac ideas ineffectively or unci early. 



Si * gggil^ jg CgngKmgim! Sugg3S r nO NS; clarifies lames; 

clearly and effectively presents uooful ouggeottons, evaluations or 
psrtk&m information; amplifies on suggestions presented by others: 
relievos tension and promotes group concensus. 



» **»+* «»«*W « s s m s M u w »w» 



Sg^BE^PgragaiMn a CERBATLY Mmmggg ; Provides orientation 
os help* reorient group; assists others to participate; analyses, 
summarises, synthesizes discussion; reconciles differences 
. Integrates ideas to achieve group solution. 









OVERALL RATING 

tiO candidates who say little or nothing should be rated poor; those for whom most of 

Vj/ a j J ar ^, in ' 9* ,-f anci ^ snou ^ ka ra ^ e< ^ no higher than poor; good or adequate 

atirigs depend on the distribution of tallies between levels II and III* excellent ratine 
requires that the majority of tallies be In levels ill and IV h 
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LvWKU 1 lOiir. : Hare a tally mark in the appropriate box for EACH .statement 

EACH candidate nndu'f#. 
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OVERALL RATING 
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3 * Poor 
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10 * II* Excellent 
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28 « 31 



82 » 85 
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Factor J; Jii< 1 j vi'ltut ! acliicvowi- at 

Til)'! factor deal. v. it h th** »p.a!Hy of 
the cai,.!l'!ate;d di. ch:;; inn. It (Joes 
not deal with their effect ivene.m a:; 
participant?.; in a group. 

Candidate.-; who .score* HIGH arc those , 
who present ,'ioluf iojifi to the problem, ’■ 
effectively amplify on .solctiunn pro- [ 
seated by others, and express their 
ideas dearly, logically and effectively. 

Candidates who score LOW are those 
V/ho present few ideas of their own, 
f. e., who merely sit back and* ratify 
the ideas of others, or who mala? in- 
appropriate (irrelevant or unwise) 
recommendations or who express their 
ideas in an unclear, illogical fashion. 
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Factor If; Ability to assist the group to reach 
its c.on Is 

This factor deals with the effective- 
ness of the candid den' participation 
as members of a group which is 
charged with the responsibility of 
achieving some objective. 

Candidates who score HIGH are those 
who assist others to participate, 
summarise what others say, attempt 
to clarify issues and to reconcile 
differences of opinion in order to 
roach agreement and in general 
assist the group to reach some con- 
. sensus in the allotted time. 

Candidates who rank LOW are those 
whose statements impede the group 
in effectively exploring a topic and 
arriving at a consensus; they may 
contribute nothing or their contri- * * 

foution may be disruptive and divisive. 

As is the case in Factor I, candidates 
who say very little or talk about 
irrelevancies should score low In 
this factor. Candidates who score 
high in Factor I can nevertheless 
score low in Factor II if their pre- 
sentation so monopolizes the dis- 
cussion as to impede achievement of 
the group's objectives. 
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40 - 43 
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Poor Adequate Good Excellent 
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Poor Adequate Good Excellent 
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Factor JU; j:fficihc conduct ;>■; j.r. mb-T sf 
a | *, i ■ « *t i j > 

TJljS f.'Klor flr.il .; v. iih h:i,v v, vll 
the candiddes \v**!*l' with 
col U a;*, iii s in solving mutual prob 
Jems and in handling prof* ssjonaj 
differences, It differs from 
Fasten 11 in that Factor II deals 
with v. hat js said, iv.it Factor j)i 
deals with how it is said. 

Candidates who score 1IJ(JJJ are 
those who arc able to accept dis- 
agreement without becoming up- 
set, who refrain from sarcastic 
comments, avoid interrupting even 
When they obviously have somethin' 
Important to say, and in general 
give the impression that they wel- 
come the participation of others. 

Candidates who score LOW arc 



those who have* difficulty controlling 
their emotions, • interrupt to an un- 
due extent and in general show little 
concern for the feelings and ideas of 
others in presenting their statements. 

Candidates can score high on 
Factors J and II and low on Factor HI 
because they may present good state- 
ments and work hard to get the group 
to arrive at a consensus, but, in the 
process of doing so, they may anta- 
gonize others and cut off discussion 
abruptly. They would then be limiting 
participation and thus reducing the 
effectiveness of the group. 
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Poor Adequate Good ’ Excellent 
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Factor IV ; Overall effect ivemm in the 3'atkcl 
Management Coni*: j enee 

Candidate;, who rank ilK ! JI haw mad-* 
an ove rall go-, I impression in * 
of the interaction (,f Factors J, 
and ill. Candidate.*; may ran?: 
moderate in I, II, and J1J, bill high 
in IV became the interaction of all 
three factors produces a favorable* 
impression. 

Candidates who rank 1 ,0V. have done 
so poorly in one or more of the 
other factors as to render their 
performance on the whole as in- 
• effective, 
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IDENTIFICATION 

Candidate's Identification Hunibc-r 

Examiner's Name 

Subject: 1 (CJ Adult Orthopaedies 



Date ;.. , 

Cases 



- - Alternate _ 

3 D Trauma 

2 □ Children's Orthopaedics 4 Q Interpretive Skills 

5 Q Simulations 

Starling Time ... 



EVALUATION 



Unable To 
Evaluate 



Recall of Factual 
Information 



Definite 

Failure 



Marginal Good patient 



Analysis and Interpretation 
of Clinical Data 



00 



Problem-Solving Ability; 
Clinical judgment 



Relates Effectively; 

Shows Desirable* Attitudes 



00 

o 



□ a n 

03 02 03 

a □ □ 

01 02 03 

0 □ □ 

01 02 03 



00 



01 02 03 



a a a 

04 03 06 

D D D 

04 05 OS 

a □ □ 

04 05 06 
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04 OS 06 
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10 11 12 
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10 11 12 
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07 08 09 



□ no 

10 11 12 

□ P o 

10 11 12 



Candidate was difficult to evaluate because: 

D Ho Spoke slowly p Ho scorn. excessively nervous 

65 - 

□ He spoke rapidly 
66 

□ He did not speak English well 
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□ He seemed confused about the 
61 examination procedure 



□ Other— 
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□ I did not find the Candidate difficult to evaluate 
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RESULTS Ob' MULTIPLE CX)klUU,/Vi'IOi:AL AH7.LYS !l S 



USING SUDSCOJ {|*;S AS JNDJ 



jpemjjhnt variables 



: 3.n reading thrs table plan no note that the* fo 3 low! ncj 

abbreviations are to identify the toots, sub-tests 
and listed as independent variables. 



t 




Test or Sub-test 

D3I » Diagnostic Interview 
MC s» Multiple Choice 
Prob » problem Identification 
PT1 « Proposed Treatment Interview 
| oral} 

WS « Written Simulation 

«** 0 m 

Bio Mech « Biomechanics 

Gen Orth - General Orthopedics 

Hand Surg ~ Band Surgery 



S core [ 

Diag. Sel# « Selection of Indicated 

Procedures on Diagnos Li] 
Problems I 

Diag. Avoid « Avoidance of Contra- I 

Indicated procedures of 
Diagnostic Problems I; 
Treat* Sel. «* Selection of Indicated! 

Procedures on Treat- I 
went Problems I 

Treat, Avoid « Avoidance of Contra- 

Indicated Procedures o 
Treatment problems 
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Appendix j.4 (con't) 

RESULTS OF MULTIPLE CORRELATIONAL ANALYSIS 
USING SUBSCORES AS INDEPENDENT VARIABLES 



First and Second Year Residents 
N « 109 



Dependent 

Variable 



Independent 

Variables 



Par- 

tial 

r 



Simple 

Correlation 



Factual 

Information 



.53 1.83* 



Prob 1 WS I) lag Avoid 
PTI Overall 
PTI Interaction 
Prob II WS Treat Avoid 
Prob I WS Diag Sel 
Prob II WS Total Sel 



Problem 

Solving 



.51 1.67 



Prob li WS Diag Avoid 
PTI Overall 
FT I Manner 
MC Trauma 
MC Bio Mecli 
Prob I Drag Sel 



'-'i 

• • 

' ' ' ; . ■ ■ : 

••• •. • 

. , ■ . .. 

V • 'V— { 



Information 

Gathering 



.42 1.02 



Prob II WS Diag Avoid 
PTI Overall 
PTI Manner 
MC Bio Mech f. . 



• • a ; ■■■.• u.'i •' it . -p 

l;/ If;;'-' 7 77 y- / 
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7.48 


**-.19 
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•k *k 


.12 
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* 


.04 
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.24 
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.08 




-.22 


4.71 


* -.10 




-.18 


3.04 
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**-.24 
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.28 


5.56 


•k 


114 
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' 


- . 24 


5.24 
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.01 


'• :• v : 


.20 


3.71 




.23 




*120 


3.61 




*06 


• ’ ' w 1 
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-.16 


2.28 
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W. 77777« i 
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4.46 


* 1*1 


117 


. - ■ ' •' . • 
' •• : • — 

.■7 illil -V ' V. 

• Vv • t 

-1. -1 ■; 1- iA 

■ t 


.20 


3.63 




.14 


.• .• i" .. 7 1 - 

111! i § 


-.20 


2.42 




.04 




-.16 
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2.35 
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APPENDIX 14 (can't) 



RESULTS 01’ MULTIPLE CORRELATIONAL ANALYSIS 
USING SUBSCORES AS INDEPENDENT VARIABLES 



First arid Second Year Residents 
H « 109 



Dependent 

Variables 






Clinical 
0‘udgment ‘ 



Patient 

Relations 



Colleague 

Relations 



-n 



R 



F 



Independent 

Variables 



Par- F 
tial 
r 



45 1 , 19 



,45 1 



51 1*62 









Simple 

r 



PT1 Overall 


,25 


5,92 


* .09 


Prob 11 WS Diag Avoid 


-.18 


3 * ’ "> 


-.16 


PTI Interaction 


— * 18 


■ 3,0 ■ 


,03 


PT1 Manner 


-.17 


2. So 


-.01 


Prob 1 WS Drag Bel 


— * 1 6 


2.25 


-.05 



Prob 11 WS Treat Bel 


,21 


4.10 


* , IS 


MC Bio Mech 


- * 2 1 


3.79 


-.08 


MC Trauma 


.19 


3.40 


,15 


Prob 11 Diag Avoid 


-.18 


3.01 


,17 


PTI Manner 


— > 16 


2,40 


— . 01 


Prob 1 WS Diag Sel 


-<,16 


2.39 


— * 04 






Prob 11 WS Diag Avoid 


-.25 


5.98 


* 


-.21 


PTI Overall 


,23 


4.95 


* 


,19 


MC Bio Mech 


— * 2 1 


4,11 


* 


— . 08 


Prob 1 Diag Sel 


19 


3.35 




-.02 


MC Trauma 


.18 


3.06 




,17 


PTI Manner 


-.17 


2 . 78 




.07 


Prob I WS Diag Avoid 


-.16 


2.32 




-.11 
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APPENDIX 14 (on't) 



.RESULTS OP MULTIPLE CORRELATIONAL" ANALYSIS 
USING SUBSCORES AS INDEPENDENT VAR3.AELES 



First and Second Year Residents 
N * 109 



Dependent 

Variable 






Overall 



R 



F 



.55 2.01 






.48 1.37 






Independent 

Variables 


Par- 

tial 

r 


F 


Simple 

r 


* 








PTI Manner 


— • 31 


9.14 


**-.08 


Frob 11 Treat Sel 


« 28 


7.28 


** .19 


Frob 1 Diag Sel 


— • 21 


4.09 


* -.05 


MC Trauma *< 


.21 


3.85 


.21 


MC Bio Mech 


3.9 


3.30 


-.09 


PTI Overall 


.27 


6.95 


** .12 


Frob 11 Diag Avoid 


27 


6.79 


**-.18 


Frob T. Diag Avoid 


-.21 


4.13 


* .11 


PTI Manner 


**, 3.9 


3.24 


,01 


MC Gen Orth 


.17 


2.77 


.18 


bi Interaction 


-.17 


2.66 


,06 
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APPENDIX 14 ( con * 1 ) 

RESULTS OF MULTIPLE CGRIUiLATIOKAL ANALYSIS 
IJSX KG SUBSCOKES AS JIHJKPKNDENT VARIABLES 



Third and Fourth Year Residents 
N « .1X9 



m 

W 1 



Dependent 

Variables 



Factual. 

Information 



-Problem 

Solving 



Information 

Gathering 



R 



.53 1,60 



^56 1,1 



Adult Oral 

Prob X WS Diag Avoid 
Prob I WS Treat Avoid 






Independent 

Variables 


Par- 

tial 

r 


F 

t 


Simple 

r 


MC Gen Orth 


*-#20 


4.11 


* -.20 


MC Bio Mech 


#20 


4.06 


* ,20 


Adult Oral 


.17 


2 • 96 


.17 


MC Pathology 


.17 


2.92 


, 17 


Prob I Treat Avoid 


-.17 


2.6? 


-.I? 


MC Hand Surg 


.16 


2.52 


' #16 



(i 



4.87 * ,35 

4 . 02 * 

3,37 









63 2.7 



** 



PI Diagnosis 


.22 


4 # 4 3. 


Prob 1 WS Treat Avoid 


***20 


3 • 85 


Adult Oral 


#20 


3.74 


MC Bio Mech 


*16 


2,53 


Prob X WS Diag Avoid 


#15 


2.27 


MC Hand Surg 


*15 


2 . 22 



APPENDIX 14 (con* t) 



RESULTS 01*' MULTIPLE CORUEL/vTlONAL ANALYSIS 
USING SUBSCORES AS INDEPENDENT VARIABLES 



Third and Fourth Year Residents 
N ~ ,119 



“**»wn •»* NhattfWM 



Dependent. 

Variables 



Patient and 
Relationships 



Colleague 

Relation, ships 



Overall 

Competence 



F 



Independent 

Variables 



cU. 



F Simple 
r 



r 



.49 1.29 



‘ *' « ■— i >i * 



.53 1.58 



. Prob i WS Treat Avoid 
DX Communication 



* <c* 
* « 






,41 1, 



—•* *WW MfIM «)MM 



Notes In dependent Variables with F-ratios below 
^»00 are not shown even though they make 
9osne contribution to the R. Some dependent 
variables have not been shown. 

f . * Sig at .05 level 

** M e.; .01 level 

*' Proposed Treatment Interview * 



Adult Oral 


* 24 


5.84 


* .23 


Prob XI WS Treat Avoid 


-« 19 


3.59 


— * 3. 0 


DX Communication 


-.18 


3.15 


-.12 


DX Diagnosis 


. 16 


2.50 


*05 


MC Hand Surgery 


.15 ' 


2.07 


*13 



5.06 * -.13 

2 



MC Bio Mcdh 


.19 


3.66 


,33 


Adult Oral 


.17 


2.73 


*27 


MC Gen Orth 


-.16 


2,37 


*16 


MG Hand Surg 


.15 


2.09 


,21 
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Append :i x 15 



Multiple degression Analysl s, 
19GB Certifying Examinati on 



NOTE 



In reading this table note that the following 
abbreviations are used to ideal I fv thv tests# 
sub-tests and score listed as independent vari- 
ables. In each case the variable name lists 
first the content or form of the test with a 
dash followed by the name of _ the score or sub- 
score. All tests other than' those specifically 
identified as multiple choice or written simu- 
lations arp in the form of Oral exercises. 



Tests or sub-tests 



« Multiple Choice 
0 and 't « Observation and interpretive 
Skills 

WS - Written Simulation 



Scor es 
'same as 



PS ~ Problem Solving 
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U « 39 . 1 . 



pr« t»«W*S, u*‘ »*•>' 



Dependent 

Variable 



IX 

n 

n 






*»>#*r* , . 

** * r ** y ' * r ' m ' 



Information; 
Gathering 



(T?. 

' v* :* 



Problem 

Solving 



Clinical 

Judgment 



‘O 



Ho lia- 
bility 

of 

Depend- 

ent 

Variable 

w*ia »(w *t ***.<» v» ivimir 



R 



F 



* C*29} 



,#*■*<*** ****?* 4*>VVf4* , *V tr.# 



(•29) || .49 6.72** 



5.13** 



L * 



C.22) 



.34 




r! . 


... .... .. ^ 
Par- 




J Siwpl 




tial 






Independent 








Variable a 


r 


F | 


! r 


* 

. * 

Mul t ipl & Cho ice -Reea 1 1 


1 .12 


4.48*| 


! .21 


0 and interpretation 


1 .11 


4.48* 


.21 


Trcuimu-Problem Solving 


.10 


3.84 j 


.22 


ChilcVProblerft Solving 


: .08 


2.43 | 


.17 


MC-Problbm Solving 


.06 


1,34 


.19 


Adult- Problem Solving 


.05 


0.82 ; 


. 16 


S insulation -'Attitudes 1 


! .05 


0.82 ! 


,14 


WS~ Treat Select 


1 .04 


0.81 


.12 


WS "'Diagnostic Select I 


; .04 


0.48 j 


• 15 


WS -Treat Avoid 


-.03 j 


0.23 1 


.06 


1 * 


■n^.92 4 ..^ 




!•*»«••»*?> i*L « 


0 and l«- interpretation j 


.16 | 


9. 83**1 


# 

.27 


Multiple Choice •‘•Recall 


! .14 


7.56** 


.28 


S irau Xation-Att ituci e 


: .09 1 


3.31. j 


.19 


! Trauma -Problem Solving 


: .os I 


2.48 | 


.21 


MC-Problem Solving 


.08 


2.47 


.23 


WS~Treat Select | 


.07 ; 


2.03 


.14 


Adul t-Problera Solving 


. • 06 j 


1.39 1 


. IS 


Child ’'Problem Solving 


.03 


0.32 


.14 


WS- Treat Avoid 


*** #i X 2 


0.10 I 


.07 


0 and I- Interpretation I 


.13 I 


6.21* 


.22 


Trauma •-Problem Solving j 


.09 ! 


2.79 j 


.20 


Multiple Choice -Recall 


.08 ! 


2.62 | 


.21 


S imul at ion- Attitude s 


.08 1 


2.46 


1 .17 


Adult-Problem Solving 


.06 1 


1 • 24 ! 


: .16 


WS -Treat Select 


.05 | 


■1.01 1 


;i3 


WS- Diagnostic Select 


.04 i 


0.69 1 


: .15 


WS- Problem Solving ; 


.04 


0.55 


! .16 


WS~Diagnostic Avoid 


.01 ;■! 


L .03 


-.04 


WS-Treat Avoid 

% \ 

f 


-.01 1 


.02 ; 

v i 


! .06 
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Dependent: 

Variable 


| RHLIA 

biuty 






| INDEPENDENT 


PAH 




: TO 

OKPKNJj- 


' R 


P 


VARIABLES 


TXAIi 






3 


! ENT 

) VARIAW.1 




; R 


p i 










. *.>**• <*.#< 




1 *•*+*«* * * **** ‘V *'*♦* »» * W V. «• w<w«w . V* «r*r<*#* 






Surgical 


(.25) 


.26 


2.42** 


0 and I " 1 n l o rp r e t a t io n 


• 3.2 


5.39* 


Technique 


• 






Ch i Id -P r obi e; a 8 o Iving 


.08 


2.28 








< I 


Adult-Problom Solving 


I .07 


1.94 !! 










MC -Problem Solving 


I .04 


1 0,59 










i Trauma** Problem Solving 


^ .04 


0,51 








* * 


| V?S**Treat Select 


.04 


0.48 |i 










MC-Recal.l | 


*04 


! 0.46 ! 










Simulation*’Attituclesi 


1 ,03 


: o.so j! 




4 * 






WS "'Diagnostic- Avoid 


• 01 i 


: 0.84 jf 




* A 1 - >, •■** 4 #* .A . 4 ^ * J i.km M 






WS** Diagnostic Select i 


i -.01 


: o.o3 ! 


> f 1 "r3r"4»:r p * o ntV w . 

Patient 


(•2b) ; 


V s ^Z 1 *4 ' * 
.21 


2.80** 


* * ^ 4^ ^*/ 4**% JT 4rf£"»r # 41 48 9-t •^**114 *#•$< 1^ 

S i :.iula t ion -Att i bud o s j 


* ’♦’•f « »•••£'< # *»' 

. 11 


>itl t'Sitit* «■»*’ * * •* 

4.33*;; 


Relations j 








WS -Diagnostic Select 


;* ai I 


: 4.20* 










0 and I ••Interpretation 


i .10 


.3,78 j 










Adult-Problem Solving j 


.06 


1.30 










WS -Treat Select 


• 05 ! 


^ 0.88 ij 










WS -Diagnostic Avoid 


.03 


1 0.39 










WS- Treat Avoid 


-.02 


1 0.15 










Trauma -Problem Solving 1 


.02 | 


0.12 [ 










MG -Problem Solving j 


.01 j 


0.05 i 


* 








liiif S.?liohle..n. .So 1* ? Inc; 


^.lo;L 


ML. 


Continuing j 


(.23) | 


• 29 i 


3. 20**j 


0 and I- Interpretation j 


.10 ! 


3.64 j; 


Responsibi- 








WS -D .1 ague at i c 8 s lee t 


.10 


1 3.42 


lity 








: Adult** problem Solving j 


.07 i 


; 2.10 ! 




“I 




Chi Id '•Problem Solving 


*07 


1.93 ! 




1 : 




1 < ? 


Siinul at. ion** Attitude s 


.03.! 


1.03 










: MC-Rocall 


: .05 1 


.01 |i 






: ’ .j 


t . ■ | 


* W3 -Treat Avoid 


: .03 | 


.077 










WS-Trcat Select 


.04 


■ 0.49 ! 






■i 


r i 


WS**Di agnostic Avoid 


r .04 j 


0.49 ' 










• WS-D.iagnoet.ic Avoid j 


.03 | 


0.30 ;! 








i ■ 


i MC -Problem Solving 


1 *01 


0.05 j 


* : J 








| ; - i 


r ■! 

V 


f ; j 



SIMPLE 



as 

• .1.4 
. .14 
.12 
.12 

• 08 
. 13 
. .10 
.01 



,.15 
.19 
•17 
.12 
.14 
-.05 
*02 
*10 
*09 

.18 

.17 

.15 

*15 

.12 

*16 

.09 

• 13.. 

.i(vj 

-•03 " 

• It 



224 



APPENDIX .13 (coni; * cl) 



N - 391 






» •*% . * « *>*«•■ * 



Dependent 

Variable 



hi 

Emergency 

Care 



& 

Colleague 

Relations 



Ua lia- 
bility 
of 

Depend- 

ent 

Variable 



{.28} 



R 






Independent 

Variables 



20 



3,0** 



(•26) 



Ethic 






*26 



2.5** 



s 






'O 



(.28} 



.25 



0 and X- Interpretation 
trauma -Problem Sol viny 
A c { u 1 1- ~Pr ob 1 &ju Solv i ng 
WS -Diagnostic Select 
Child “Problem Solving 
WS- Diagnos tic Avoid 
MC ‘"Problem Sol vi ng 
S iuiul a. t io n-A 1 1 i tuda s 
Multiple Cboice-Racal 1 
WS-Treat Select 
WS -Treat Avoid 

r * *+**•*•»!*%» .iw 4* <*v* *"/«*><*» 

0 and 1 ’-Xnterpr e ta t ion 
WS -Treat Select 
Adult-Problem' Solving 
Simulation-Attitudes" 
Multiple Choice-Recall 
WS -Treat A* void 
WS -Diagnostic Avoid 




I 



2 . 41** | Adult ‘’Problem Solving 
! Multiple Choice -Recall 
WS ••Treat Select 
S imula t ion- Att itudes 
Child-Problem Solving 
Trauma -Probl em Solving 
MC -Problem Solving 
WS— Treat Avoid 
WS -Diagnostic Select 
0 and 1— Interpretation 
WS -Diagnostic Avoid 



: Par- 


* >» .mm*.,..*., 


SimpL 


tial 






| r 


:! F 


; . r 


T..„ WWW „, 




. — — 


: .14 


7.85*' 


® 21 


| .07 


1.67 


! .16 


! .06 


1.42 


.14 


! .06 


1.38 


j .11 


1 .06 


1.25 


! .13 


! .04 


10.63 


-.02 


i .04 


0.58 


! . 14 


• 

o 


0.55 


i .12 


.04 


0.53 


! .15 


-.03 


0.42 


i .04 , 


-.03 


0.40 j 


! .03 








'* .07 


3.31 1 


,16 


: .oo 


2.50 ! 


. 14 


.07 i 


2.23 


.14 


.07*! 


1.78 1 


.12 


.06 


1.44 ; 


.14 


-.06 


1.31 


-.01 


.04 


0.66 


-.01 


.03 j 


0.45 : 


.12 


JlBIJ 


8; ft ! 


.12 
. r 10 


.09 j 


3.05 ! 


.16 


.07 j 


1.60 


.16 


• 06 


1*39 1 


.11 


.06 ! 


1.19 ! 


.12 


.06 


0.89 j 


*12 


.05 j 


0.73 


.14 


.04 


.60 1 


.13 


-.04 


.47 ! 


,02 


.03 


.35 


.12 


*.03 


.26 


.11 


.02 ! 


.09 

■j 


-.04 .. 




•ii' ■ ' 









225 



hWElUrXX lb (coni. ’cl) 



H « 39.1 



m% ••* ******* .*#*. ■*- ** * -w- #.-•»*<»■*»#■ 


• ** ‘ r* ** t Mrff* 

itdJa- 






«•„ t ^VV’*'**-* #****#» - » J*#-. rr». *. »a» ifry.| t 0 - • • .t’. *. ,£*? 


i «*»*#.« n! .T#* 1 W- 




ft r# !•»' 


Variable 


M.l.ity 








p; ; « 




Simple 




of 






i Inclcr* endent 


tia.l 








.Depond- 






Variables 








# 


lent 








j 3-' • 


V 


r 


,i 


Variable 

f) ■*<,*"- -* -.v # i 


K 

rtMN l* •* 


; F 

w ».* *. n « 




L„ 






Overall 




/ 




♦ 

I * 

. 








Competence 


• *31 


( .37 


5.28** 


i 0 and l.. x ! vtespre ta tic n 


; . 12 


| 5 . 48* 


/ .23 








« 


: Multiple Choi cognac all 


\ .12 


5.46* 


' .26 










! Multiple Choice -PS 


\ .09 


,3.13 


! .22 










! Adult- Problem Solving 


1 .08 


2.24 


I .18 


• 








! S imulcition '•Attitudes 


: .06 


1.21 


.15 










Trauma '“Problem Solving ! 


! . 05 


.86 i 


• 18 




0 






WS -Treat Select 


.05 


1 . 84 1 


,12 




; i 






Chile! ♦“Problem Solving 


I . 05 


1 .77 j 


. 14 










WS*‘Diagnosfcic Select 


.03 


! .70 j 


#15 / =■ . 


i i 








WS ••Biagnostic Avoid 


.02 \ 


.19 ! 


-.03 ^ 










j 


' * 


+9ZJ 


" **. ..»!■(* ... 

' 




■ * Sig a 


t . 0: 


level i 


• , 










** Sig a 


t .01 


; - ! 

level i 


| 

• 






' 








| 


y. 

\ 


: 

1- I 

j . | 


. ■ 1 


•. 

[■ ■ (S/.; 

*• 


'* . 




i 


* 


% 

i , ■ , ' . ’ : 


* 


k- ; 





,#hS 



Sot 












Appendix 16 

American Board of Orthopaedic Surgery 
INSTRUCTIONS TO E yjmXBRS 

fob qhal examinations 



1NTH0DUCTX Oi’J 

The oral examinations will consist of. 5 one -half hour examinations. 
Threo of the so-- prob.1. om- Sol vlng Ad u 1 1 , Pro bi em-So lyi ng Children 1 s 
anc2 — wixT focus on Th71^ndl?ate * s abTuty 

to handle realistic clinical problems. The fourth, the s5mu.1at.ecl 
Ia$Sacy:4£W/ Will focus on ability to relate effectively to paTJents 
and colleagues. The fifth. Obse rvation ’an d Xntoyp^ will 

focus on ability to observe and interpret data, Detailed 'instruc- 
tions for administering and evaluating each of these examinations 
are given .in the following sections of this document, A separate 
document entitled Instructio ns to Candidates is enclosed so that 
you will have an opportunity to review the information the candid- 
ates have been given about the examinations. 

* 

This document describes all of the oral examinations. However, you 
will wish to give special, attention to the sections relating to the 



examination you are adm.inrster.it.ng and the general instructions on 
rating.. Copies. of all. the cases you will be using will be made 
available the night before the oral examinations. 



Prepared with the Assistance of 

The Center for the Study of Medical Education 
University of Illinois College of Medicine 



PROCIiKM-SOIiVIMG ADULT, CHILD HEN 1 8 AND TRAUMA 



T he Pr ocog h o £ Problem Solving. 



The main purpose of those- examinations is to evaluate the candidate’s 
ability to reason correctly and logically and to arrive at a diagnosis 
or plan of treatment for a particular patient based upon a consider- 
ation of all criteria. 



The analysis of the oral examinations given in the past revealed 
that they often duplicated the written examination in testing for 
knowledge content (recall, remembering) • This important area can 
be tested by written examinations » and it would seem that the hund- 
reds of man hours invested in orals could be used more profitably 
to obtain information about the candidates thought processes, skills 
and attitudes that cannot be readily assessed in the written format . 

Since the majority of a candidate’s knowledge is intended for appli- 
cation to problems in real life, our task, in evaluating his problem- 
solving ability will b© easier if the examiner starts with the pre- 
mise that possession of knowledge and the ability to apply it are 
fttmnmnnmis. It may also help to consider the steps in problem 



synonymous » 
solving : 



|1) The problem. Is. identified. (History/ examination, labors* 



|2) Depending upon his familiarity with such problems, the can 
didate perceives the problem ass 

Immediately having B. dlnitially unfamiliar 

familiar aspects to so he searches for 

guide thinking. familiar elements* 

(3). Reconstructs familiar elements to make them more completely 
resemble a familiar patient problem* 

|4) Reinterpretation of the patient problem in; light. ;of ^bb 
available data about the patient and the present state of 
knowledge, (clinical: judgement) 

(§) Selects "orthopaedic principle, " theory, idea or metho^d 
of generalization suitable to problem* 

:(6) Applies principle to problem, .(tentative diagnosis or: 
treatment) 

( 7 ) Arrives at solution and con f inn s it . . (working diagnosis 

or treatment) ; /.:■* ■/: / 
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While t his proco ss j.n based upon the candidate * a possession of 
basic knowledge , in this examination wc arc interested in how he 
applies whatever knowledge he has. If it is a diagnostic problem/ 
docs he "jump” to conclusions or does he systematically rule out 
alternative possibilities. In the therapeutic problems does he have? 
the ability to reason effectively j.n supporting his decisions about 
treating a case j or which there is not necessarily one acceptable 
solution. 

Forma ts t o be Used 

COhe examinations in Adult and Children 1 s .will include two types of 
problems „ The d iagnostic Problem will require the candidate to 
elicit information concerning a particular patient from the exam-' 
iner and then present his conclusions and the reasons supporting 
them. The second type'/ the Defense of Therapy Problem , requires 
the candidate to review the findings concerning a patient and out- 
line and defend a course of treatment. 

In case a candidate either "blocks" on a prablettv or solves it with 
extraordinary dispatch/ the examiner will have one additional 
#.gnao, o f Th er apy Problem available, 

. * ■ ' ■ . { 

The Trauma examination will include the Eme r g e ncy Tre atment and 

OoimlAc'at ion Pro blems. The Emergency Treatm en t PrlSTem will require 
the candidate to outline his treatment of an ome^encJlpatient with 
multiple injuries. The Com plication P roblem will require the can- 
didate to describe and defend his management method. The Trauma 
examiners will have an additional Complication Problem to use: in 
ease a candidate "blocks "on a problem or finishes his problems 
very quickly. 




P rocedures to be Followed 

D iag n ostic Pro blem 




At the beginning of the examim tion you should hand the candidate 





the case description which indicates the age, general appearance/ 
occupation and chief complaint of a patient. You should instruct 
the candidate that his task is to elicit data on the history/ physical 
examination, laboratory findings and x-ray findings from you. 

Unlike some earlier experiments with this type of exercise, you will 
not be "role playing" a patient during the inquiries on the history, r 
but you will simply give the historical findings as requested, IT 
IS EXTREMELY IMPORTANT, however , that you insist that the candidate 
fee specific in hi s inqu i r ic s, and you refuse to answer; any vague. 











#.•> 



’■4 
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questions a bo it l. the "general coucl.i t.;i on" of the; patient . pc very 
careful about, giving irrelevant, cluon to the candidate, if, for. 
oxumplc; , he a ah s , "Does the patient have a.n earlier injury to hi s 
elbow?", clo not nay, "yen." Put nay, "The patient nays’ W hurt his 
arm when he was seven years old." it wiJ .1 then be up to the cund- 

idatc to find out if this arm injury was of the elbow or some other 
fracture; . 



It is re cog nr nod that the candidates wril from time to time ash 
unanticipated cjuoat ions . /mowers to these questions will have 
to be "ad Ubbcd" cn the spot so as to fit as precisely as possible 
the case described. If the candidate asks pertinent questions for 
which you have not supplied an answer, give an answer consistent 
with the case and diagnosis, part of the test is, after all, de- 
signed to determine the candidate’s ability to elicit the proper 
information regarding the case. If, on the other hand, the unant- 
icipated questions are* irrelevant or immaterial, you will have to 

answer in a vague or non-specific way. Sometimes’ a simple "l don’t 
know" is best. 



you will be supplied x-rays. If the candidate requests an x-ray 
which is not available, you may describe any abnormalities that 
Would have been present. 



After about 10 minutes of information gathering you should stop 
the candidate and ask him for his diagnostic impressions and his 
reasons for preferring this diagnosis over other possibilities, 
you may then ask a few more questions designed to probe the candi- 
date’s mental processes. You should not, however, ’ engage in a«de~ 
bate" with the candidate as more information is probably obtainable 
by exposing him to another problem,. 



Defense of Therapy Problem 



At the beginning of the examination you should give the candidate 
the description of the case 0 r J?he description will inform him that 
he may obtain additional pertinent information from you. You should 
instruct him that his task is to formulate a definite treatment 
plan for the patient and to explain his reasons for recommending if. 
You should allow him about 3 minutes to read the description and 
about 10 minutes to describe his procedures. 






vi 



You should question his recommendations and conclusions in order to 
discover the criteria the candidate is using to arrive at the solu- 
tion and the rationale upon which his recommendations are based. 
Remember, you are not so much interested in what method of treatment 
uses as in his reasons for choosing that method. Does he use 
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avn j..l able data and orthopaedic prj nclplcs, ox is ho a 
os. I hopiii'd.i st. who ha /. 5 one answer for a given situation 
want to bo ‘‘confused by the; facts"? 



"cookbook 
and doesn’t 



'<* . - 



3n di sens sing the on no with the candidate you should strive to put 
him at ease by avoiding a threatening manner and emotional dialogue. 
You should avoid giving him clues as to what you think is the op- 
timal course. 



You can control the te mpo ( 1 the examination by keeping in mind 

the following criteria for judging /ut t.re«.v»i'r*nt the candidate elects. 



Has he identified the; problem? . 

Does he understand it .in hxs own terms? (categorised into the 
proper "model system") 

Did he have the proper data? 

Did he ask for more pertinent data? 

boos he use available knowledge about the condition? 

What docs he expect to accomplish? 

Is this based upon the patient’s needs? 

Does ho select the; proper principles to apply to this specific 
problem? 

Can he discinguisn botwoon scientifically discovered and proven 
principles and empirically based ones? 

Does he know which type he is using? 

Did he weight various factors properly? 

What were hxs reasons for weighting factors low or high in 
arriving at a therapeutic plain? 

Could ho apply his method to this given case? 

Will it accomplish the desired result? 

Will it be the method of .least risk to the patient? 

Does he anticipate complications? 

How will he deal with them? 

Is he aware of alternate plans? 

Why hot use them? 

What if his plan fails? 

How will he judge the end result (criteria)? 

Your questions can then be designed to see how well candidates meet 
these criteria. 



E mergency Treatme n t p r obi em 



*The main purpose of this examination is to gain information on the 
candidate’s understanding of the most effective ways in which he 
can meet his responsibilities as a physician and an orthopaedist in 1 
providing emergency care for multiple’ injuries* 



Vi? 
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You should give the candidate Die case doner j pti on and .1 nr. true t 
him to outline a cU agnos tic and treatment program for the patient 
described, and to indicate what priorities Ik would establ :i sh for 
each step in his plan. You should permit him to outline his 
entire plan of evaluation and treatment up to such time as non- 
orthopaedic consultation is available,. If he attempts to solve the 
problem by resorting to immediate consultation, suggest to him that 
no consultation is immediately available * 

After the candidate has outlined his plan you may then attempt to 
ascertain his reasons for those recommendations and the priorities 
lie would assign to each component. You may give him feedback as 
to the results of various moves as he goqs along so that ho can 
use such data in deciding on his subsequent moves. You should 
allow the candidate about 3 minutes to read the case description, * 
and about 10 minutes for the discussion. 



%: •• 



You should avoid giving the candidate clues by pointing out errors 
in priority or action? instead, focus on the reasoning behind his 
recommendations. You are attempt ’ g to find out WHY he chooses 
a given step at a particular tiro the management of the case. 

Does he establish priority of tr* ant based upon knowledge, data 

at hand and judgement as to the i a of this particular patient 

at this time, or is he simply £o.l. ; v;ing a set routine? 

ffihis technique in many ways is a variant of the pc f o n s o of T herapy, 
Problem . The instructions for that technique also apply to this one 



Compli cation Problem 



The main purpose of this examination is to obtain information on 
the candidate 1 s ability to formulate a plan of management for a 
patient in which some complications have developed, The candidate 
need not be expert in the detailed management of such illness, but 
the successful candidate should be able to indicate the most effec- 
tive methods of defining the patient 1 s problem and organizing 
effective help into a plan of treatment. 

At the beginning of the examination give the candidate the descrip- 
tion of the problem, and inform hin that his task is to outline a 
plan of management including further diagnostic and therapeutic 
measures he believes necessary. You should provide him with feed- 
back on the results of various steps in his management „ You should 
allow the candidate about 3 minutes to study the case description, 
and about 10 minutes to discuss his procedures, 

C J? 



'j'h.U; technique.', like the Jbigryeiv^y Treat, men! pro] >!? ov\ . in vai jant 

to iho ~VhJ iimlriic tiour' regarding 

that technique also apply to this one, 

Milldto of the Exi : nftin?il . ioxiji 

Chi I d rem 1 # Exam 1 nut ;i on 

Uic fii s t problem used should bo the* ))i <tq riosl;.i,c Problem. h‘’a r. the 
can did cite outers the room, choc): >-• ^oi^nj^T^,Z77nd gi vo him 
the . case description of the Allow him about • 

a nunuto to review the sheet and then ash him "to proceed. Give 
him about 10 minutes and then ash him for his diagnostic impressions 
If he wishes to give you his impress j.ons .earlier , he may do so, 
lie should be given about 3 minutes to describe his impressions * and 
to answer any questions you may pose. You will then give him the 
the Candida to * s Case Description of the Defense of Therapy Problem 
and allow him about 3 minutes to read it. He should then p7ocoed w 
to describe his ^ suggestions for therapy. Them suggestions should 
take about 10 minutes. At this point stop the examination and 
dismiss him. After he leaves, mark the Rating Form and write any 
pertinent notes or comments on tno bach, if the candidate “blocks M 
on ^ the bofensc^ or finishes both problems very" 

U ?° th ° backMUp to fill the" 

remaining time. 

Trauma Kxamina t ion 

The first, problem used will be the Emergency Treatment* t> rob lem 
When the candidate _ enters the *om # ^hSrfs "'identitlcit 'give 
hi.m the SSg gJ&Bci ciBti&n Emergency Tre atment Problem and allow him 
about 3 minutes to read the problem. He should then proceed with 
nis description of the therapeutic steps he would follow. These 
suggestions should take about 10 minutes. At the end of this time, 
you should hand him the Case Description of the Complication 
£^lejn and give him about 3 minutes to read it. "is# should then be 
given about 1G minutes more to discuss the problem with you. At 
rhis point stop, the examination and dismiss him. After he leaves, 
mar ' the Rating Form and write any pertinent notes or comments on 
the back , if the candidate "blocks** on any problem or finishes 
both problems very quickly, use the back-up Gonmlication Problem 
to fill the remaining time. — 

problem Solving Exa minations 



General 

Ali. of the candidates .in all examination subjects are to be rated 
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on all four factors described in the special rating forma nupp.1 iod 
by tho Board . However , some.* factors will bo much more cuisiJy rated 
than others in this examination. The Bxamination Committee of the 
Board will take this into account in docj.cl.iny on tho appropriate 
weights to bo assigned for each factor in the examination. If you 
believe that you simp? y cannot rate a candidate on a particular fac- 
tor in an examination, chock tho box entitled "Unable to rate" on 
the Hating Form, Koto that the main emphasis in tho Problem-Solving 
Adult, Children's and Trauma portions of tho examination is on prob- 
lem solving and clinical judgement, as described on the Rating Form 
and in the special notes below. You should conduct your examinations 
to arrive at a clear impression of the candidate's rating on this 
factor. You should not be too greatly concerned about other factors? 
they are being extensively probed in other parts of the examination. 

Bo not, however, use the "unable to rate" option unless you have ab- 
solutely no impression of the candidate's ability in these areas. 

If, for example, he fails to read an x-ray properly, mark him as a 
failure in factor 2, "ability to analyse and interpret clinical data," 
Your impressions which by themselves are unreliable, can, when com- 
bined with other data, serve to give a reliable overall picture of the 
Candida to * s abili ty „ 

* 

In addition to the notes below, be sure to carefully review the 
Rating Form and the General Comments on Rating. 

Kotos on Rating Problem Solvinq for Bach Technique Used. 

Diagnostic problem. In rating this examination the examiner should 
keep in mind that reasonable, efficient thoroughness in gathering 
date, coupled with intelligent use of the data so gathered in arriving 
at a realistic primary diagnosis, are the keys to the candidate's 
success in this part of the examination. The cases have been selected 
to avoid both obvious, s tr aigh forward problems and rare, unusual cases, 
The candidate's approach is extremely important. Hot all of us are 
so mechanically efficient that we don't make occasional false starts 
or take an occasional wrong turn in working out a diagnostic problem. 
However, we must realise that gross inefficiency can waste so much 
time as to load to decreases in the quality of patient care that can 
be delivered by the medical profession. Furthermore, the ability to 
acquire data loses much of its value if the information acquired is 
not synthesised into some realistic and meaningful conclusion. How- 
ever, the emphasis in this portion of the examination should be as 
much on the candidate's methods of arriving at a diagnosis as it is 
on his obtaining a precisely correct diagnosis* 

#• 

Dojen s e of Thera py prol^lenn in this examination the most aeceptabie 
candidate will be able tot 
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0) rt'f'vVjn.i y,(* the* basic: }>roh.lam / 

(X) for fui the;- data which ere reasonable in 

of. t.ii"’ basic prohH on\ r 

<3) his reasoning on an accurate comprehension of relevant 

,;ac:l " »>« generally accepted principle.'!, 

(4} r!^ ! : e£fGCkivo and prectical therapeutic sol- 

ution.. .in the light of the data presented, 

{!.} demonstrate ability to respond flexibly and reasonably to 

quo r-it 4cm(i about the handling of complientions of failure of ; 
xrutiaj therapy, ‘ 

(6) choose Vti.i id or 3 ter a to judge results, 

1’ho unacceptable candidate will s 

Clf bo mm1..To to recognise the basic’ problem, 

(2) request ad£tj tJomJ. data in a “shotgun” manner, 

{,} treat the* ease* oa the* Undo of indications also, not rocog- 
A - IU/5J * nc J Uu ' ncsed for prerequisite conditions, 

C4) suggest treatment regimens tailored to bin own prcconcievad 
notAGfifi rather than to a particular patient 
CJ) adhere* to a 3 3 g xd or ineffective approach, 

(6) .fail to judge* results well* 

• WK ' aoooptabl ° c«naiduto in thin tech- f : 
“f 1 b , ab ?'? °"h 1ino a realistic program to ovaluai ad- ' 
»rohW wi-H?' - B y i^^hrontening situations, to manage «.uch 
and ' in tn tt Uy “ £l . my that min.iMi/cs dangers to tho patient, 
lay the fotindatio^ f', X ? f rcatmc ' nt or diagnostic routines that will 
craat-e Candida?! ' f.M •. , Ut °* W****** b£ other problems. <fhe inado- 

and^-f • cc, ??! Th 1 1,0 the P*oM°m ox will delay evaluation 

•has * * ° f hS probiolM to tho P° infc of jeopardi King the patient's 



%n oj:p3aining his reasons for tho t 
did cite will have sound reasons few: 
thG > “treatment by rote" exhibited 
follows a prescribed routine* 



.reatment selected , the adequate can* 
■ his procedures in contrast to 
by the individual who simply 



fhe adequate candidate will also reveal Ms problom^solvino *k±ri« 
by asking^ for relevant information about the ‘ patient *s condition 
and by usrng this information in the course of his discussion of dia- 
nna treatment. Ife will bo alort to possible com 0 

and to r*»r>r»)Tl ~.J 5 ..f . ... J" .■ v ^»V**V'tti.ion8 



fflavw ,„ f . fZV 1 r rmprrcatrcms Of data presented, 

1? ■ or complicate matters by recommending time-consuming 

» .or procedures that require excessive manipula- 
lion of the* acutely injured psxtioixt * 



Xn rating this technique, the acceptable 

candidate will : 

(.1) correctly njjpraise the problem, 

{?,) choose a realist© c p.l an in the light: of. data presented, 

(3) continue to to X low the patient, 

(4) be aware of possible complications . 

Tbo unacceptable candidate wills 

(1) not recognise the problem, 

2} treat the case in an unreasoning conditioned manner, 

3) fail to follow the patient, 

fail to re co gni.se complications. 

$ XM U.LtTED 1 INTERVIEWS 
* 

Description of the Examination 









The critical incident study of the critical performance requirements 
for orthopaedic surgeons listed a number of requirements which dealt 
with the orthopaedist * s ability to handle situations requiring inter** 
action between himself and patients and himself and colleagues. It 
has been found through extensive research and experimentation that 
the most effective and efficient way to gather such information is 
through role playing in which you play the role of a patient, a 
consulting physician or nurse, or other member of the health team,, 
and the candidate plays the role of an orthopaedic surgeon. 

In each situation the candidate will be given a description of a 
situation based upon a clinical case history, lie will then be ex- 
pected to play the role of a physician in. two or three typical situ- 
ations such as: 

|,1) explaining the next step in diagnosis or treatment, 

| discharging a patient from the hospital, 

} explaining to a patient that he has found no abnormality, 
discussing a poor prognosis with the patient or, family,, 
talking to a patient who has consulted other orthopaedists, 
talking to a nurse about a change in procedures. 

Some Sug g estions on Role Playing 

In this interview it is particularly important that the examiner act 
as a typical person of the age, sex, educational and occupational 
level described. 

In order to assist in keeping the interview moving and to assure that, 
each candidate faces a comparable situation, some suggested questions 
are noted below. ‘This list is not exhaustive or appropriate for every 
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discunnicm when each 80 «e most appropriate! 1 '' 

t yp ica l nati rmt, 

1® What is wrong with mo? 

2. Why do X have to go to tho hospital? 

3® Can x return to my usual occupation? 

s’ nl/'i f ° Uo ?, “vorago eouroo what can j expect? 

Hov; long will X* bo off work? 

6* Do X need an operation? 

7® What is going to be clone at the? .hospital? 
o» Can x have another opinion? 

?: °° you thil* I should goto a chiropractor? 

10® May this recur? 

ll ♦ Might this be? a cancer? 

12* Is this treatment dangerous? * 

13* Do X have arthritis? 

trea t ment 

J^SS[. 

< # * jf 

1* Will he be able to walk? 

2* Will he have any deformity? 

3, Are all of these x-rays necessary? 

4* Does this run in the family? 

5 . Why can * t X stay with the child? 

6, Can we wait a while? 

7* Will he be normal? 

Inte riHgwfl 

^o^t ti ^ 1 - be a<3m , in f^ OTea by two exami-ners who will use 

the room he , Jri ?9, oa f hour. When the candidate enters 

the room he should be introduced to both examiners and then n 4 , wn 

f he first ^tionr^s^S lllTn 

s ia“:i r f en , , he r uia 

iSTon 9 e rthi* pofaf U U Wma and ca^ldatra^w 1 :^: 

Will "role nl!v» ' f 1 ® cxam - n( : r who observed the last simulation 

in the same fashion as tho^i^^l^Whof ^he second 
S°s!m U “u:r inerS WiU a - ,ain dlanse Pla0GS and a ,, 
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At the* * ohcl of. the third simulation the boll w.ill ring and the can - 
did a to should proceed to the next examination. At thin point both 
examiners should mark their Rating Forma indoponden tly , 



the p imuXat od Xntorv.i ev:s 



General 






All of the candidates in all examination subjects arc to be rated 
on all four factors described on the* special rating forms supplied 
by the* Board. However # some factors will be much more* easily rated 
than others in this examination. The Examination Committee of the* 
Board will take this into account in deciding on the appropriate 
weights to be assigned to each factor in this examination. If you 
believe that you simply cannot rate a candidate on a particular 
factor# check the box entitled "Unable to Rato" on the Rating Form. 
Mote that the main emphasis in the Stim u lated Inte rview is on 
ability to relate effectively# as described on the Rating Form and 
in the special note below. Conduct your examination to arrive at 
a clear impression of the candidate ’ s rating on this factor. Do not 
be too greatly concerned about the other factors? they are being ex- 
tensively probed in other parts of the examination. Do not# however , 
use the "Unable to Rate" option unless you have absolutely no impres- 
sion of the candidate’s ability in these areas. If# for example# 
he fails to read an x-ray properly# mark him as a failure in factor 
2# "ability to analyse and interpret clinical data," Your impres- 
sions# which by themselves are unreliable# can# when combined with 
other data# serve to give a reliable overall picture of the candi- 
date * s ability. 



In addition to the notes ' below# be sure to carefully review the 
Rating Form and the General Comments on Rating. 



Note on Rating Ability to Relate In the Simulated Interviews 



This factor contains a number of elements which you should keep in 
mind in evaluating the candidate’s 'performance* First# you should 
consider the information the candidate provides in terms* of its 
effectiveness in meeting the goals of the interview. Candidates can 
do poorly in a number of ways. 



|1) They can say things which would cause needless alarm# dis- 
comfort or embarrassment. For example# they can over-emphasize the 
consequences of lack of treatment. 



(2) They can fail to speak honestly to the patient. For ex- 
ample# they can make over-optimistic claims regarding the effectiveness 
of treatment. 
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{.$) They cfin make oLatom<mts which rcivon.! unprofessional attJ • . ,*** 
udos. toward patients, colleagues or other metH cal personae 3 , cv * 

couxd he conn trued tin tact lean, .incline reel, or undxgnx f led , 

IkBSPl'Ai yon should consider the manner in which the candidate conducts 
him sc .11 in the interview. Do-:; the way ho cond>. -ts the interview 
in terms of posture, voice, manner**,,, -^rations, gestures, etc,., 
communicate genuine concern and : erect in the problems of the per- 

son involved. 



OBSERVATION AND 1 N i'BUP REl'AT 1 ON 

Genera l D es cr i pti on of the nat ion 

ability to observe and interpret accurately is an integral part 
of the requirements for the successful practice of orthopaedic snrg- 
cry. The conclusions drawn from direct observation freauently 
detox mine the diagnosis, decide the course of treatment or determine 
the efficacy of treatment. This examination is directed toward 
the evaluation of the candidate ® s ability to observo and correlate 
information derived from microscopic slides and x-rays. 

The examination will last approximately one-half hour and will con- f 
sist of 5 or 6 sets of materials, some of which will involve the " 
interpretation of x-rays, some villi deal with the interpretation of 
pathology slides, and others will require the correlation of slides 
with x-rays . For each set of materials the emphasis will be on 

che candidate *s accuracy of observation and interpretation of what 
he sees . 



Administration of the Observation and 
interpretation Examination 

t mmm* «* * ■ ■* ' "*»■* m » — ,m ■ . *••-**** , mmm— 






You should first present the material to the candidate and instruct 
him to describe what he sees precisely as he might in a written 
report, indicating any abnormalities that may foe present* 

If the candidate fails to interpret the material properly you might 
supply him with some additional historical, physical examination or 
laboratory data which would assist him* You should not provide this 
additional information until AFTER he has initially described his 
findings. If he identifies some abnormalities you should then ask 
additional questions which would probe his ability to interpret what 
ho sees in the light of his knowledge and understanding of physio- 
logical and pathological processes • Fox* example, you might ask him (: 
to speculate as to the reason that the structures on the slide show 
,the patterns they do, or you might ask the probable effect on the : 
abnormality of various types of treatment. 
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DO KO'j? SPEND TOO MUCH TIME ON ANY OWE EXERCISE. Remember, you are 
mainly concerned with wJmt the candidate seen and how ho interprets 
it* If you ash too many quest ions about diagnosis the candidate may 
simple answer them on the basis of his basic information and not on 
the basis of any observational shills. Although we recognise that 
it is extremely important to assess the candidate's basic store of 
information on diagnosis and treatment, this area of competence is 
being probed thoroughly in other portions of the examination. 

Eatin g th e Observation and Interpretation Examination 

All of the candidates in all examination subjects are to be rated on 
all of the four factors described on the special rating forms supplied 
by the Board. However, some factors will be much more easily rated 
than others in this examination . The Examination Committee of the' 
Board will take this into account in deciding on the appropriate 
weights to be assigned to each factor. If you believe that you simply 
cannot rate a candidate on a factor in a particular examination, 
check the bo x entitled "Unable to Rate" on the Rating Form. Note 
that the main emphasis in the O bser vati on and interpretat ion Examina- 
tion is on factor 2, the ability to analyze and ■Vnterprcfc”^tal '■ Con- 
duct your examination to arrive at a clear Impression of his ability 
in this factor and do not he too greatly concerned about the others. 
They are being extensively probed in other parts of the examination. 

Bo not, however, use the "Unable to Ratey option unless you have ab- 
solutely no impression of the candidate's ability in those areas. 

Your impressions, which by themselves are unreliable, can, when com- 
bined with other data, serve to give a reliable overall picture of 
the candidate's ability. 

tn addition to the above, please carefully review the Rating Form 
and the General Comments on Rating* 

' * 

GBHBRAL COMMENTS ON RATING 

Arriving at a Ratine for the Entire Half-Hour . 

It is recognized that some candidates may do better on one exercise 
than another. Your task is to reconcile the ratings on each exercise 
to arrive at an overall judgement. In most cases it is probably 
best s imply to aver age the '.ratings, but this need - not always be 
truc. It may turn out that the candidate's performance on one exer- 
cise demonstrated such effective ;;or such poof per formance that you 
wish to give it more weight than a mere averaging would allow. It 
is perfectly all right for you to do so I 
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!!’!'* , r ‘' 1 l ! ),y U»«**<>lvoo will not servo to prmn or fail the camU~ 
f. "..f 1 ;, 11,0 xutln 0 a ° J ' «*» oral examiners will bo gathered fcoge- 
itor- ' v i thc writton examlanUnM. to provide the' 

v-uVV v of V,,.. i jciate , The Board 

• doca.do v;ho fails, baaed upon the dm -rented. If this 

r y 3 /’ t0 '■ or,! Jt 3f> oxtreuely imp. hmt tha ... report your 

impressions of the candidate as you new. hi:. j the or" t-;l f-hour 

In many cases examiners tend to be .lenient ), ,,a they realise La'-* 

in the absence of other data, failure in a half-hour tost is hurdle 
indicative of .inadequacy. However, in this ease your “failing'' grade 
\ U nut. be- crucial unless others find similar evidence of inadequacy 
If tl.11, system is to work, you mgs t .rate 'the c andidate as failing if 

on t l io problems giv en "him du rin^i^" 
h our you b oo Jxirn , • ’ 

i TiLOrfl to^bfa. Avojd o d An- Rrntiim ’< • 

Moat rateiB tend to rate indiv- 

iCtiMl* ncyir the upper half of the scales If you find that most of 
yoiu. 3. a tings are in the. good and excellent ranges/ you should review 
ypuj- ^udgemonts to make certain that you arc not overlooking some 

° RG ^' Ciinb ? r t]mt a ra ting on any one factor is only a small 
pert of the evaluation of each candidate. On the other hand, some 
raters tend to rate individuals at the bottom half of the scale, if 

??. U ,. f ' Illd tha \ mo ^t of your scores are in the poor and marginal iranae*! 
perhaps you should re-examine your j udgc?men ts 1 Rcnmembe“that ew can 

fashion^ P ° ; r£cCtly ' but man ^ can do tasks in a reasonably competent 



&US-J& £SE . 3 Making judgements is a difficult task 
Some pudges tend to avoid the task by rating everyone average, xf 
most of your marks are in the marg inal and good categories/' perhao^ 
you are failing to take into account some fndividuai patterns of 
strengths and weaknesses 0 '. ‘ " 

Since the factors described are interrelated, some 
pudges tend to rate a man at the same level on ail factors. Bxper- 
lence indicates, however, that people do have different ; patterns of 
ability ^ For example, someone who may possess a great deal or prob- 
lem-solving ability may find it difficult to relate to patients: and 

colleagues. Thorefo^ mXnm 'the "candidate ^separai^y 

each factor* * ^ 



Appendix 17 

AMERICAN BOARD OF O 3 i'i’i IDiV. J*;j i X C BUKCBRY 



INSTRUCT K>i;B 



TO C7U!iaj)ATKS 



0 r a 1 Kx tH.ij.) i a 1: ;i on s 






A .*3 indicated on the .in format ion sheet sent to you earlier, the oral 
examination will consist of five one -ha If hour examinations. Three 

o £ these , £r ol;0 tj.;- ^ol v imp A dul t, Pr ob 1 em ; -£ g 1 vine Ch iM^Vllft und 

if t will focus on your ability to deal with real- 
istic clinical problems . The fourth, B airn i? at < d Interview, v.’il 1 focus 
on your ability to relate effectively to patients and colleagues, and 
the fifth, pl»(3cxYh>.tio_n and . Intp^ will locus on your ability 

to observe and interpret data. All of the examinations will require 
you to discuss realistic case material. , in the first four examinations 
the case material will bo presented to you in the form of. written des- 
criptions; in the last examination you will be required to observe and 
interpret x-rays and slides. Detailed descriptions of the types of 
problems to be used and the procedures you will be ashed to follow arc 
given below. 



please read these instructions carefully because there will not be 
time for the examiner to present detailed instructions during the 
exam-* nations . 



f 



THE PROBLEM-SOLVING ADULT, CHILDREN'S 
' AND TRAUMA EXAMINATIONS 



Gener al 



Four different types of problems will be .administered. The De fense of 
Th e rap y Frobl era and Diag n ostic Proble m will be administered in the 



Adult and Children’s Examinations, 
Complication Problem will be admini 




Problem and 



Erne r g one y Tr e a tment 

in the Trauma Examination, 



Note that while the instructions 
of 13 minutes for each problem, 
earlier. 



indicate that you will have a maximum 
in some cases you may finish a problem 



Defense of Therapy Problem 

This type is designed to test your problem solving ability in formula- 
ting a plan of treatment for a par t iciilar ' pa trent ^ : Y55T Wli be ' given, 
a protocol of a Cctse . This will include a summary of the pertinent 
historical,- physical and laboratory data. Relevant x-rays and inform-,' 
ation about the diagnosis will be available* If you believe that you 1 
need more data before you can formulate a definitive plan of management, 
you may ask the examiner. 



You v.’.ij J ho 9:1 von about 3 minute*:; to read the; protocol. M. the end 
of fliat time you w;» 1 .! he expected to state* in your ov;n terms what 
.Yf?M the prob.l cm i s , the plan of therapy you would recommend, find 

your re a no nr. for the treatment you select. You will have about 10 
minutes to comp.l etc* this problem „ 

J?he CAtimmer luay quo r. Lion you from time to time in order to determine 
mY yo« selected a given approach. lie in fully aware that there are 
many motnods to treat a given problem, but he is interested in bow you 
arrived at YOUR approach, lie may question you about the entire con— 
cepe .involved in your method or about, your ideas behind given parts 
of your therapy. 

Your score on this part of the examination will not depend so much upon 

the method you select as it w 1 X 1 upon your re asoning in deciding upon 

an approach for this particular patient. (Clinical judgement.) 

* 

Comp Xieafc ion s P r obi ems 

fWImu...*. *»*•*• _ , ttrm- r t 



This type is designed to test your ability to formulate a plan of 
management for a patient in which some complication has developed. 

You will be given a summary of a case which will include relevant 
historical, physical and laboratory data, including x-rays. The case 
? description will also include the original diagnosis, the steps followed 
in treatment and the existing complications* 

You will be given about 3 minutes to read the protocol . At the end of 
that time you will be expected to outline a program for the management 
of the patient. You may ask the examiner to provide information on the 
results of diagnostic or therapeutic procedures if the plan of manage- 
ment you outline requires such information. You will have about 10 
minutes to complete this problem. 

• 

The examiner may question you from time to time in order, to determine 
TOY you selected a given procedure. He is fully aware that there are 
many approaches to. such problems, but he is interested in YOUR approach . 

Your score on this exercise will depend on the skill you .demonstrate 
Li identifying the problem and the re asonin g you use in formulating 
your plan of management. * 

Eme rgency Treatment Problem 

. s Problem is designed to evaluate your .ability to obtain the critical 
h data necessary to initiate treatment , t to diagnose a patient 1 s total 
problem, ot facilitate management and* to 'institute appropriate initial 
care for a patient in an emergency situation, . 
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You wiJJ ho given a brief Go ret ipl .ion of a ease as presented in the 
emergency .room, You will hove about 3 minutes to study the problem 
ond you wi.l 1 then outline a program for the initial management of lUo 
patient - not only with regard to specific actions to ’*'>’< :.ken, but 
alno to the orevr and prxorj ty of the actions yen recommend. You may 
aah the examiner for specific physi enj 3 ubor story findings (including 
x-rays) in order to mane your judgements. The examiner may also indi- 
cate the results of each action tin you take it. This di scansion 
should take about 10 minutes. 



You are reminded that the problem revolves around the initial diagnosis 
and treatment of the case and has nothing whatever to do with long- 
term management which wilt bo taken up elsewhere* The cases presented 
are ail taken from the practices of those; conducting the examination. 

You should use the information given as you would .in your own practice. 
Assume that you have available to you the personnel and equipment that 
you ordinarily would have in you own hospital, 

4 

Your score will depend upon your ability to identify the problem effec- 
tively and describe an adequate and effective plan of management. 

i 

fttag npfiti c prob lem 

•Ip! 

The Diagnostic Problem is designed to test your ability to gather infor- 
mation . concerning a patient and to arrive at reasonable conclusions 
concerning his illness. You will be given a brief case description 
including such information as the age, occupation, sex and chief com- 
plain of a patient. Your task will be to question the examiner to 
obtain the necessary information about the history, physical, x-ray and 
laboratory findings needed to obtain an effective differential diagnosis. 
You will be given about 10 minutes to gather the information. At the 
end of this in formation -gathering session you will be required to pre- 
sent your diagnostic conclusions and your reasons for them. Your 
explanation of your diagnostic impressions should take about 3 minutes. 



Mote that during the data-gathering session the examiner will not 
interpret the data for you, but only give you the same information you 
might get from a patient. For. example, if you ask, “is there a history 
of injury? 1 ', the examiner will probably say, "Where ?% if you then say, 
"To the arm, " the examiner will say, "The patient says that he hurt his 
arm when he was very small," Therefore, be specific in all your inquir- 
ies j the examiner will not volunteer information? you will have to for- 
mulate your questions so as to elicit the, precise information you want. 

If you do not obtain sufficient information from a question, pursue tiV 
matter until you are satisfied, but. do not waste time exploring what is. 
obviously a "blind alley*" Vary your attack should your questioning 
along one line prove unrevealing. You may allocate your time among 
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various linos of inquiry .in any wry you pro for, asking additional 
quo s. lone about, any aspect of the invos t.i g; it. Ion at any time during the 
interview . 



Your f?core on this examination will n ot, depend on whether you have 
arrived at a "correct " d.i agnosia . xt. will depone! on the skill you 
demons 1 rates in gathc;« iny information to arrive at a diagnosis# and the 
• logic you employ in explaining your conclusions , Xt is important, 
therefore, for you to bo sufficiently thorough in exploring reasonable 
possibilities that ought to bo considered in the differential diagnosis* 
However, as in any real situation, you shovel d avoid wasting your time 
in ashing irrelevant questions or exploring remote and unlike 3. y poss- 
ibilities . The problems used in this part of the examination arc 
typical of orthopaedic practice in general* 

THE SIMULATED INTERVIEW 



This examination simulates situations which are quite familiar to you* 
If you place yourself in the familiar M rolo M and conduct yourself ac- 
cordingly you should not experience any difficulty with the format of 
this examination* 



?:y The problems, used in these tests are not obscure. They are typical 
of general orthopaedic practice. There is no attempt to trick you 
with rare or unusually complex problems . «• They are designed to give you 
an opportunity to demonstrate how you would talk with patients and their 
families about their illnesses and how you would talk with nurses and 
other physicians concerning patients. 



You will be given a description of a cl 
diagnosis and the proposed next steps 
to review the case description. You 
to interview the examiner who assumes 
another physician* Your task will be 
might have arisen in the management of 
are, and to gain his understanding and 
to explain your problem and management 
his confidence. This part of the oral 
such problems * 



in leal situation including 
You will have about 3 minutes 
11 then have about 6 minutes 
role of the patient, nurse or 
explain to him what problems 
the patient, what your plans 
cooperation. You should attempt 
in clear, simple te arras and gain 
examination will contain three 



Your score will depend upon fl) the effectiveness of the information 
you provide to meet the requirements of the situation described, and 
(2) the skill and tact you demonstrate in communicating to the person 
described. 




Thu <ib.iJ.vUy to observe a» ;d inl.oj prat accu.i a Lely is an .integral part 
of the retirements for the successful practice of orthupaedi o surgery. 
The cone] us ions drawn from tl.lrt.-cL observation frequenijy deforodo the 
diagnosis, decide thu course of treatment or do ten. mine the efficacy 
•of ti Th.rs cxahi.1 nation x& directed toward the evaluation 

of your ability to observe and con el ate- in foriiv.it ion derived from 
microscopic slides and x-ray;; . 



.ZVftor reviewing the material presented# you may be ashed to dose vj.be 
what you see# precisely an you might In a written report# indicating 
any abnormalities that may be present. 

After you have finished doneribing what you see you may bo ashed to 

indicate probable causes for any abnormalities you identify. 

* 

Your score on this examination will depend mainly upon the accuracy 
with which you report your observations, * 
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On January 18th, follow I r»y the administration of thy ora? portion of the I 9 CO 
iori j . iutt ion Examination each erer ulncr at lending a subject matter d 1 . Jof ii» . 
session war, ashed to complete a quest lo nnafro sampling ItSr. reaction to the *' 

, . at , ?*' t,K ' ? rnl cv ‘ cnI,iQt ' Jc) ns. Approximately 220 examiners were 
employee in 00 m mistering these examinations, of whom f<Ji completed the ques- 
tionnaire. These examiners overwhelmingly endorsed the new oral examinations 
in though areas for improvement were note* 



•xi 



The Center for the Study of Medical Cducat Ion of the University of Illinois 
tabulated the responses made by the oral examiners to the 21 Items in the 
questionnaire. This tabulation is attached for your reference. It will be 
my purpose to summarize and interpret this data so that the Board and more 
-pociHc«iJly the Examinations Committee can make maximum use of the comments 
and suggestions made and opinions registered. « ’ 



Xnlmppe tiva Ski l la 



I!j? ex f,?! n ® l rs lu . lnict ’J>J‘®tivo Skills found the orientation session least help-ff 
u . J'iod the orientation session dealt more with Interpretive Skills this 4 

Sn^/!H ap8 i 1 0Ukl h ? ve dlff€!, : en£ * As *?'*»«. two Interpretive Skills 
exunin^r^ attended no orientation session (Item 3) . 

* 

Me assignment given to the Interpretive Skills Examiners was that tiiey should 

emphasize the question * What do you see? 11 rather than “What would you do?” 

^ 18 exaiminers followed this like of questioning is not known? however 

their pynn evidence that they were aware of the restriction this placed on 
their* examination (Item 2). * 

ll'l e ?K ct . a f ,ons *£ Interpretive Skills examiners seems to be greater than 
the pel romance of the candidates, therefore they tended to be more critical 

■ ■ CQRdidatfis, their preparation for the examination and their train tact 
programs (items k t 10, ! $ and 20). . ng . 



Simulated 2ntei\yiQW 



There was less agreement that the Simulated Interview provided valuable Infor- 
matfon about the candidate than about any other section of the examination. 
■I^ 1 ® J? part explained by the content of -this portion -of the examination and 
in part by the strangeness * of tfie examination technique (items I £ 2).** 



* n v l : v ; U^jross, Jr., Director, Office of Education and Evaluation 

f . »lmei icon Board of Orthopaedic Surgery July I3&8 

uu ^ 50 Pdftoctcd In this conclusion Is the feeling that candidates are not of 
prepared for this fort Ion. of the examination f item 20) , to the point where 
soma examiners questioned the fairness of the examination (Item to). 
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Tran-na 



The Trauma examiners war c* r*ost critical of the coses supplied. There were 
ci number of reasons to justify tills opinion ell of which cun he remedied 
{item *j) . They were also most critical of the x-rays provided which may in 
part explain why they were critical of the coses (Item 11). 

With all, the Trauma examiners felt that the candidates ware wall prepared 
to manege trauma problems (item 20). 
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The Children's Orthopaedic examiners seemed to have some difficulty In 
applying the 12 - point rating scale (Items 7 and 12 ). 

They also seemed to feel more restricted by the standardized format In 
terms of the questions they could ask than the examiners in other areas 
{item 8 ). 



Mu lt Qpthopa e diac 

• The Adult Orthopaedic examiners reflected many of the conclusions already 
reported but in no Instance did they feel so strongly. This perhaps Is 
Indicative of the more general nature of this examination* 



fll-COhllEliPAT I Q MS 

Based on a rev lev/ of a questionnaire completed by 194 ora! examiners 
following the 1963 Certification Examination it Is recommended: 

4. That the standardized oral examinations as administered In i gut 
be continued as an Integral portion of the certification procedure 
of the Board, but that more attention be paid to the selection of 
cases and materials. 

2* That a person be designated for each subject matter area to serve 
on the Oral Examination Task Force and that the person be respon- 
sible For securing and editing the cases and materials and present* 
ing them orally to the examiners during the orientation session. 

3* That orientation sessions be planned in conjunction with meetings 
of orthopaedic surgeons held In the fall and winter prior to the 
* 65 } Examinations and on the day prior to the oral examinations 
which emphasize; 

A* The objectives of the oral examination 

B« The role of the examiner 

C, The rating of candidates ; . * 
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On Janus ry 17 end 13, 19w*3, following each group of oral examinations the* 
candidal cs attended a debrief lug session. At the bey i nninQ of this session 
eacli candidate was ashed to complete a questionnaire giving his opinion 
about the examinations-wri tt.cn and oral** he. bad j if & i completed, Processing 
of these questionnaires was delayed until thy* results of the examination 
had been mailed to the candidates so that the opinions expressed would in 
no way affect t lie*, outcome of the examination'. 

This report is based cm the questionnaires completed by 700 candidates 
selected at rondo.; 100 of whom were successful and ICO unsuccessful in 
their bid for certification. The data was tabulated by the staff of the 
Center for 'the Study of Medical Education (CSME) of the University of 
-Illinois. The results give soma indication of the validity of the Board's 
examinations and are corroborated by the more detailed studies to be 
reported by CSME. 

Df the *00 candidates in the* sample, 32 thought they had been successful 
arid were, while Oh had their feelings of failure confirmed , and 36 were 
not certain about the outcome. Gf the remaining candidates, 21 thought 
tiioy had been successful and wore not, while : kj were successful who did 
not think they would be* Applying these ratios to the actual population 
taking the examination, it Is evident that the majority of the candidates 
were, pleased or surprised by the results, while less than $% thought they 
had been successful arid were not. There will be considerable cor respbn** 
dence with and tut this latter group in the months ahead. 
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On the basis of this questionnaire the unsuccessful candidate may 'be 
Characterized as being weak in factual knowledge, interpretive skills, 
and In the application of the knowledge he possesses, but strong in 
treating trauma and in doctor-doctor and doctor-patient relations, 

The successful candidate may be characterized Cs being weak, though not 
as weak, in factual knowledge, but strong in the treatment of trauma, 
the diagnosis and treatment of children's disease, in interpretive skills 
and in doctor-doctor and doctor-patient relations. The most critical 
difference between success and failure perceived by the candidates appears 
to be Interpretive skills. This has been verified through comparison 
of the examination results with the ratings submitted on the ’'Candidate 
Evaluation Form’.’ 

The candidates* opinions ware sol icf ted concerning the overal 1 examination 
process as well as the various portions of the examination. Two thirds of 
our sample felt that the examinations adequately assessed their knowledge | ; 

and understanding of orthopaedic surgery, while a third thought that the 
examination contained a great deal . of ..esoteric material . Only 3 candidates 
felt the examination was too easy and two of them failed, while about 10% 
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Ihert: was ci di ff erc-nre of opinion over the adequacy of the content cove red 
in the examine! ion with Uv sue. cessful candidates feeling that it was not 
.v.bquute. it should be noted i ! i :< t the successful candidates tended to he 
critical of the examination while the* unsuccessful candidates surmid 
satisfied with things «*;•> they were , 



Tito mu I 1 1 pie- choice exam i not i on was considered to ho the most difficult, 
Irrelevant, ambiguous end unfair portion of the examination. This result 
was to he expected for the "best answer", forced choice approach is a 
frustrating experience especial ly to the highly intelligent. 



patient management: problems were considered confusing by a third of 
the sample but; more relevant then the mul t f pie- choice examination and. 
therefore, fairer and not as difficult. 



trauma oral was considered the least difficult and most relevant; 
portion of the examination. The children’s and adult orals followed as 
a close second and third while Interpretive Skills and the Simulated 
interviews were a distant fourth. It Is apparent that the candidates 
would prefer a certification examination made up entirely of oral 
examinations If given the choice. There is some evidence to indicate 
the order of preference, for the orals reflects the content of the practice 
of the orthopaedic surgeon. 



The candidates felt that the oral examiners did a good job allowing the 
candidate to demonstrate his knowledge and ability, to answer questions 
to the candidate’s satisfaction and by consciously working to place the 
candidate ‘at ease. Only two candidates in our sample felt that examiners 
were "rude" and both were successful. 

Specific questions were asked about the adequacy of the visual aides used 
In the examination, What -criticism there was was directed toward the 
written examinations, which again was to be expected for, instead of actual 
X-rays and slides, photographic reproductions were used In these portions 
of the examination. However, better than 80% of the candidates felt that 
the X-rays and slides "were clear and easy to read." 

On the basis of the response of the candidates immediately following the 
1968 Certification Examination we may conclude that the techniques and 
materials employed to assess competency were relevant and the coverage 
was adequate. Furthermore the differences between success and failure 
can In part be explained by different preparation for and perceptions of 
the examination. 

further work with these questionnaires should be undertaken before broad 
conclusions arc drawn from these findings. Two tasks which should be 
completed ate* (1) the tabulation of the data provided by all the 
candidates to verify the conclusions drawn from the sample, and (2} an 
analysis of the data by group and pane! to identify any constant bias* 
which should be eliminated from future examinations. 
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American Board of Orthopaedic Surgery 
Cor tjfj cation 3'ixarni nation, January 10GB 



QUESTION: 



Response Strongly Ai/'co 
Agree 



b*uw- Disagree Strongly Didn’t 

Disagree Answer 



L Tho examination 
as a whole had 
sufficient depth 
to adequately 
assess my know- 
ledge and under- 
standing of ortho- 
paedic surgery. 
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failed 
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of an esoteric 
nature. 


failed 


5 


28 


24 


as 


8 


3. My training program 


passed 


to 


55 


20 


7 


2 


did a good job of 
preparing me for 
this examination. 
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C. The multiple choice 
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questions in which 


two answers could 


















be defended as 
correct. 
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9. The examination 
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was too easy. 
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43 
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[0. The examination as 


passed 


3 


31 


17 


40 


9 


0 


as whole covered 
all the important 
topics in. ortho- 
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paedic surgery. 


11. The technique used 
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else was confusing 
to me and I did not 


get a chance to 
demonstrate any 
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nation were confusing 
to rne and I did not 
get a chcango to 
. demonstrate my 
abilities. 

18. The examination as . 
a whole was fair. 



14, The system o"f " 
granting my *ccrii“ 
ficate only after I 
have taken an 
examination is a 
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1. Gave* me a chance 
to demonstrate my 
aWliUcjs in some 
important area?; of 
or H i opa ed i c .‘>u r g v. v y . 

2. Mont topics covered 
wore important in 
or Ihopaed 5 c pj*a ell c o , 

3; Most topic?; covered 
were irrelevant to 
orthopaedic practi ce. 

4. Examination proce- 
dure?; wore* confu- 
sing to mo. 

5.. Examination was 



6* Examination was 
too easy. 

f* Examination was 
too difficult 

8. The X-rays, slides 
and photographs 
were inadequate*. 

9. Examiner was rude 
to me. 

10. Examiner was 
Skillful in putting 
me at ease. 

IL Examiner did NOT 
give me a change 
to answer questions 
adequately. 

IB., Examiner gave me 
ample opportunity 
to show what I 
could do. 
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A few, in each ease, commented on largeness of the room and poor lighting. 



Advvu | 

r*v~ i 



views 



7 

t 






o 

ERIC 



a S, -• , •,«, < t f tv*r K-r^J qaw ■wsw.'w;. . 



:r n* :**-*• |i HI *mil 'i ' 1^1^ if~ 



p agwst^ag 






Appendix 20 



OH?j, j-;>:awj patio: j 

amkpkvtj nonni.) of oufin.)j\AEnjc fsimcKny 

Part X I : January 1956 

V: .Conf r»rc*nt‘V 

Cm s e Vofj/y i j;> ti p 1 i Hg . 3 5 



.nio put. 1 < ill if? a X/i-y car-old Ctiii css inn girl whoso f&mi Xy 
first' noted a curve in her back one your ay/ There in no family 
history of scoliosis or oilier skeletal daformi lion. The patient 
hemal i has no complaints, par menarchc* was six months ago. 

♦ 

^Four months ago she was seen in the scoliosis clinic* <>f this 
hospital j physical examination did not suggest any diagnosis other 
than •idiopathic scoliosis (i.e. no cafe-att-la.il. spots/ no positive 
neui olug.rcal signs, no Icy length discrepancy, and no muscle weak- 
ness). There was a right dorsal scoliosis with moderate rotation, I 
The curve i improved somewhat with traction and with bending/ When 
she was eject the pelvis was level, and a plumb line dropped from 
0/ foil in the center of the injury luteal cleft. Ho other defer-’ , 
ini tics or abnormalities were noted. 



X-rays wore taken at that time (Series A on the view box) . 
She was scheduled for re-evaluation in three months? at that time 
lotto month ago) she was re-examined (no changes noted in examina- 
tion) and new X-rays were taken (Series B on the view box) . 

She was admitted to the hospital for further evaluation? 
Routine laboratory tests are normal* The working diagnosis is 
idiopathic scoliosis. 



Prepared with the assistance of the 
Center for the Study of Medical Education 
University of Illinois, College of Medicine 
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ORTHOPAEDIC TRAINING STUDY 



Working papers fori Special Meeting of Examination Committee 

» 

* 

6 - 8 * 11 



UTR0DUCT1 



The major purpose of this special meeting is to prepare a. 
blueprint or set of specifications for the certification pro- 
cess as conducted by the Board and to outline procedures to be 
foil owed in determining the rate and methods of implementing 
the blueprint. In this connection it should be noted that in 
developing the blueprint we should try to design an "ideal” one 
that will serve as a long-term guide to policy. Its practical 
implications in regard to present practice and the feasibility 
of change should not enter into consideration at this stage; 
only after the guide has been completed and approved should its 
implications for format and its application to a specific forth 1 
coming examination be considered. 



These working papers have been prepared to familiarize * 
members of the Examination Committee with ' principles that are 
commonly followed in the development of test specifications. 
The illustrations provided in the text are merely suggestive; 
clearly the nature of the categories and the extent of detail 
in any blueprint must be determined by those familiar with pro 
fessional requirements in. the field under consideration,. . 
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THE PURPOSES OF AN EXAMINATION BLUEPRINT 



The primary function of a blueprint is to enable a policy- 
making group to maintain effective , systematic control of the 
nature and 'content of an examination, while delegating the con- 
struction and assembly of examination, materials to others. The 
time gained through such delegation and decentralization enables 
the parent group to concentrate on review and revision of the 
examination as a whole to ensure that it meets specified pur- 
poses „ Without this delegation such a group generally finds 
itself so caught up in operational details that there Is little 
time to establish and review overall policy and procedure. 



. In short the blueprint or specifications for an examination 



are* intended to serve the same functions in selecting and assem- 
bling examination formats and materials as does the architect's 
blueprint in selecting and assembling building materials. Thus, 
an effective blueprint will stipulate in detail the categories 
to be sampled and the weight of each category in the total exami- 
nation. As will be seen later the number of categories that 
should be stipulated is far’ greater. than would result from a 
.classification of broad subject^ areas . On the basis of the deci- 
sions concerning classifications responsible sub-groups can then 
prepare cases, questions, etc,, of. the type and in the quantities 
required to meet the specifications with respect to both content 



:, and process. These materials can then be stored and retrieved' in 
a logical fashion according to the specifications. From such a 
library one individual can then assemble an examination for review 
.and revision by the full examination committee. 

After the examination is administered, the results can be 
‘reported in terms of the categories of the blueprint; profiles o 
individuals and types of training can be prepared; and progress 
©an be charted from year to year, la addition, new examination 
committees will have the benefit of established guidelines as well 
as a point of reference for the review and updating, of the exami- 
nation In the light of progress in orthopaedics and in the science 
'■-.•of /examining,' y ' \ L;.' v.;. .P-: 



Hi 



211 



STEPS IN PREPARING A BLUEPRINT 



Tests, are thought of by evaluation specialists as work 
samples from which decisions are made concerning the capabilities 
oil the persons being tested. For example , physicians do a number 
* of things; they take historys, they set broken bones , they 
teach residents, they make diagnoses, they decide on courses of . 
treatment they reassure anxious parents, they order X-rays., 

Some tests are designed to be d ; vect samples of the performance 
of these tasks; most are indir For example , a physician pre- 
sumably needs to know the anal. / of the back in order to operate 



on requiring the candidate to 
•t the back is a direct sample 
but at best it Is only an 
1 11 , Ob s e r vat ion of the c an- 



on a slipped disk. Thus, a que 
recall anatomical information ? 
of his ability to recall in for; 

Indirect measure of his opera ti 

didate actually performing the operation would be a direct me a 
• sure of this skill. . t 

Whether the tasks finally included in an examination are 
direct or indirect measures of performance, It is useful in de 
vel oping a blueprint to focus on the Idea that, essentially, 
ate samples of behavior or performance*: . • . V//.^ 

. . * * % 

* 

The Concept of Domain . • • 

Almost everything people do can be divided into three broad 
performance categories or domains of behavior; the cognitive, 
the affective and the psychomotor. This concept of -domain pro- 
vides a useful starting point in developing a blueprint, 

* 

# * 

Those tasks which require predominantly intellectual skills 
are placed in the cognitive domain; e.g, , recalling' information, 
using information to solve problems, weighing several factors to 
.arrive at a decision, predicting an occurrence on: the basis:- ef- 
’-Specific, information , ■ Most ^ written tests or certifying ; examina- 
tions sample this domain almost exclusively. 

The affective domain includes those tasks in which feelings 
End attitudes predominate; e.g., relating to patients, demon- 
strating Integrity, relating to colleagues, manifesting self- 
control In emergencies. These are the things most often included 
; # in the term professional behavior. Some types of oral tests and 
Interviews are designed to probe these qualities; at present , in 
:, tho certification procedures of the Board, j:ey.Idfn^.0h;; 

, is collect ed; - almost exclusively by t the . Committee on Eligibility * 
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The psychomotor domain refers to those activities in which 
manipulative skill is of primary importance: e.g. * performing 

surgery 5 setting broken bones , applying splints. Rating forms 
for use in observation are often employed to measure performance , 
.s domain. * . 



'It is recognized that many tasks overlap all three domains. 
For example , in performing a physical examination of a severely 
XT13 ur e d but con sc 1011s pat rent the orthopaedist. is operating at a 
high level in all three domains. For purposes of designing a 
blueprint , however, it has been found useful to analyze and 
clfy the domains separately. 



Tlie Two-Way Grid 



• * 

The classification of tasks into domains is only the begin- 
ning. A task within any domain must be further specified with 
respect to both the’ nature of the process and of the content: which 
, it samples. The 'critical incident study outlines 92 different 
areas (©,g, $ skillful gathering or information, developing diag- 
noses, exercising judgment in deciding on care, exercising skill 
in operative procedures, etc.). These tasks are all processes , 
but they do not exist in a vacuum, A man must fee skillful in 
gathering information about something and must decide on the care 
; of somethin g . A physician can go through a process in treating . ; 
some patients which an objective observer would;: interpret as 



showing “good judgment” but the same physician could go through 






a similar* process in treating Other patients that the same observer 
would label as showing “poor judgment,” Before decisions ^can be 
made about whether a physician usually demonstrates good judgment 
it is necessary to obtain samples of his work which provide ; .evi- 
dence about his judgment m treating specific conditions m .various 
types of patients# These specifics represent ‘the content . etemfehtll 
of the tasks, , . ■ ;i-' 



In a blueprint these two elements of work samples or test . 
questions (content and process ) 1 are usually shown in the form of 
a two-dimensional chart or grid. Content factors are convention 



ally placed on the vertical axis of the grid and process factors 



M 



A 



m I 



on the horizontal axis. Any specific test, question is i0cafed^q n 
this chart in terms of the two axes , Figure i is; an; example ,&£ 
euch ! a grid . in. t he., cognitive'.^ domain * / 
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To illustrate the further subdivision of each axis consider 

the following question frpm the May 1964 Part I Orthopaedic 

Examination, ’ . ' . 

* # • * 

' * • ‘ history • v ■ * ■ • 

A 4-year-old boy had the onset of pain in his right \ 
knee three days after a fall in which he suffered 5 
abrasions . of his face. The patient developed fever 
•and swelling of his right knee. The patient was 
treated with antibiotics for three weeks. Although 
the patient's fever was controlled, the patient con- 
tinued to have swelling of his knee and restriction 
of motion. Six weeks after the onset the patient 
had a 45° flexion contracture of his knee ? tenderness 
over the medial femoral condyle and parapatellar 
.swelling , See the X-ray of the knee and ,a histologic 
section of tissue from the lesion. 

The lesion is; . .• • ■. 



a) traumatic, 

fe) benign neoplasm, 

c) inflammatory, 

clj ' malignant neoplasm, 



ith respect to process the sample question requires the 
candidate (1) to -read and interpret X-rays and histologic data, 
(2) to ^ interpret clinical information and (3). to use the in- 
formation (both verbal and visual) in arriving at a diagnosis* 
: i«ch activities should therefore : be specified:' along. the ; : process 



v £0 ■ § 



. ‘With respect to content this question appears to deal with 
(1) a child s (2) some sort of inflammatory disease/ and (3) 
pathology of the knee. However 3 if the content axis -contained 
.such topics as (1/ children, .-(2 jv disease ^ ;f3.| knee., etc*:!, it 
would be ^difficult to know where to classify the question since 
clearly it involves knowledge about all three, Consequently r 
on the content axis * at least ,■ . it is necessary: 

j foil s $ each of which contains mutually exclusiye sub- : : 
rfox .example 5 some : of t h e s e*~ditr.ensions ^a^'~sTib^ 
categories might be organised / as .'follows iig® &WM I 
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Dimension 



Sub -Category 

r*3W*» — “ — 



rype 



■ urn. j 
Adult 



f * f| *T i 



lease process 



auma 

In f I amrna tory disease 
Neoplastic 
Etc. 



£J»C 

B 



* *Et’c « . 

i’fnte sample blueprint: in Figure II suggests how the resulting 
blueprint: might look. 

> t f * 

Once the dimension and sub*- categories have been fully speci- 
fied in each domain it is necessary to assign weights to each in 
order to assure that: any given examination is an appropriately 
balanced sample of the content and processes which determine com- 
■.pel: eric e • As Figure .11 indicates the weights for the sub-categories 
in each dim ens ion should add up to 100%, If the arbitrary weight.. y 
shown on the sample were followed , 60 % of the questions would deal 
with clinical problems; 70% of these clinical problems would deal 
with trauma# It: would NOT be necessary s however for the process 
'Categories to be equally represented in each sub-category of con- 
tent; thus in the example there is no implication that 20% of the 
trauma questions should sample recall; in this illustration it 
would be necessary merely to assure that overall about 20% of the 
questions m t.ne total examination be recall. if the trauma 
quest ions were mostly recall then a higher proportion of questions 
in .other categories 'would need ; to entail problem- solving. . ' 



ill CHAM0TEEISTIGS 'OF. AN IDEAL IMEFElif 



: Id tilt 



Figure II may look rather complex and unwieldy but the pro- 
cess of creating and using such a grid is usually not a difficult 
• Most committees have found that the task proceeds rapidly * 

■ once the .ground rules are. agreed upon. In setting- these . ground.. -' 
; rules the essential characteristics of a useful blueprint must 
^ -.be- cons ideredt ' ■- v ■ ^ ■ tit ■ ■ I ! • ' 



■ ; : • ; 
11' 










:>V ' 






r::.V J. 



: V: ■ : - - - ----- 



The first is co mpleteness , This means that as far as possible 
the blueprint should cover all the important areas of concern 



The second important characteristic of a blueprint is that the 
topics included in each dimension and sub-category be mutually 
exclusive t To illustrate the problem, the categories • infant , 
child , adult, geriatric patient do not meet this criterion and be- 
cause they do not they would create difficulties In classification. 



The third important characteristic of a blueprint -is that 
the categories, be meaningfu l and easy to apply , ' They should be 
categories that are clear to authors of questions, examinees, 
the Board and Training directors, and they should be ones that 
all would recognize as relevant. 




THE RELATION BETWEEN THE BLUEPRINT AND PRESENT PRACTICE 



It should be emphasized that a chart of the type shown in 
figure IX is more of a master plan than a set of operational 
specifications for a specific examination. When a blueprint . is 
first established it* is often impossible to put the entire plan 
in operation immediately because of expense or the imperfect 
state of the science of examining, but it is extremely important 
if the blueprint , is to fee a guideline both now and in the future 
to design ‘it as though the ideal could be attained, Skills in . 
constructing and interpreting- •*• test s : are'ljaproVing at an impres- 
sive rate, and a blueprint of the ideal will! serve . to. indicate ; 
the direction that experimentation -must take to Improve the 
certification process , 



It is to be expected , however , that much of the blueprint 
will have immediate applicability and will facilitate the dele- 
gation of specific responsibilities for the construction and 
assembly of exaaiinafcioh:materiala. ? ;-and. : for their ..storage:- an# - 
retrl 



PM-MEEf INC TASKS 



• Xn. order to- expedite the business of ,■ the;, meeting : if is re* 
■quested that two tasks be completed by each participant b efore 



/meeting * . 
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Rev lev; the enclosed samples of evaluation techniques 
now used by the Board to determine how each might be 
categorized in terms of content and process, and 
therefore what types of dimensions and sub*- categories 
will probably be needed on each axis of the two-way 
grid. 



Assign priorities to the process categories included 
In the critical incident study (see procedure below) 
and return the information requested to Mr, Levine 



Procedure for Assigning^ Prior it i e s 



A. strong case could b.e made that 'all the categories listed 
the* critical incident study are highly important , but it is 
understood that individuals are stronger in some areas than 
others. Assume that you had to recommend ONE orthopaedist to 
serve ^ as a modeT of the competent orthopaedist* in private 
practice, 3.n weighing the strengths and weaknesses of the can- 
didates for this award , some categories would fee more important 



otners , Use this criterion in assigning priorities, 

t . ; ' « 

liM£* Bach of the behaviors in the critical incident 
study has been typed on a separate card. Look 
, at each card and place it in one of three piles* 
V' three header cards have been provided for label- 
ing these three piles. Pile A should contain 
v those categories you consider MOST IMPORTANT* 

Pile B should contain those categories you con- 
'd - ;>ider of MODERATE IMPORTANCE and Pile C should: 
contain those categories you consider as LEAST 
IMPORTANT. The three piles should contain about 
* equal numbers of cards. If there is an imbalance 
Pile B- should contain more cards than A or C* 

fake Pile A and divide into two piles, A1 should 
contain the MORE IMPORTANT behaviors and A2 those 
§f LESSER IMPORTANCE. The two piles:: should be M 
about equal size but in case of an imbalance A2 
should have more cards than Al*. 






Pile G and divide it into two piles. Cl. 
Should contain the MORE IMPORTANT behaviors and 
|| should contain those which are of LESSER IM- 
PORTANCE , The piles liquldhe of about equal size 
but in ease of an: imbalance Cl should have more 
cards ■ than \€2fi : 'b' - b : .. 







§ 




FOURTH : 



FIFTH-: 



Place Che header cards Al , A2 , B ? Cl , C2 on TOP 
of Che appropriate piles; put a rubber band around 
each pile, 

* 

Mail the cards in the enclosed envelope before 
May 2nd , * . 



* 
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research 






I. X MTRODUCT ION 



The Orthopaedic Training Study was undertaken 2-1/2 years ago 
'as a joint project of the Center for the Study of Medical 
Education and the American Board of Orthopaedic Surgery, The 
purpose of the study was to analyze the development of competence 
in orthopaedic surgery in order to see if certification -could be 
based on performance rather than on the passage of some arbitrary 
length of time. The first stage of the study required the devel- 
opment of a behaviorial description of the components of compe- 
tence that characterize the effective orthopaedist. The second 
stage required the analysis of the techniques of evaluation, used 
by the Board when the study was started. The third, stage required 
that orthopaedic surgeons and evaluation specialists work in eon-; 
cert to devise new procedures and improve the •old. The study 
is now in . the middle of this third stage, t 

During the past eight months a number of task forces have 
been assembled to assist in this process of developing new eval- 
uation procedures and reviewing and revising the older ones. 

The oral examinations of the Board represent one of the most im- 
portant of its evaluation procedures. They require a very heavy 
investment of time and energy qn the part of the orthopaedic v : //: 
profession, Xn January 1966/ for example# Over 150 examiners 
spent two days examining approximately 400 candidates— a commit- 
ment of over 4/ 000 man hours. 1't is important that this time be 
of maximum benefit to the training: and certification of orthopaedic; 
surgeons, * • 



W PURPOSE OF THE TASK FORCE 



Over the past 2-1/2 years# there has been a considerable 
amount of research on the effectiveness of the present oral 
examination procedures of the Board# and the beginnings of inno- 
vation in these procedures, The mission of this Task Force is to 
review the research# the innovations already made# and the inno- 
vations that have been suggested# and make recommendations to the 
research staff for further research and to the Board for imple- 
mentation of the findings produced so far by the study, Xn addition 









to Chicago and given a two-day training session at which they 4 ^ 
discussed and practiced the administration and rating of the 
new examinations , 



©he new examinations were tried out and analysed during the 
January 1966 Part II examinations* The results of this admin is- . 
t rat ion were as follows? for the most part the Patient interviews i 
met acceptance by both the examiners and the candidates. Both : I 
were somewhat pus-sled about the rating of the conference examina- 
tion. (2). The Patient Interviews met reasonable standards of 
rater reliability for tests of their type? the conference did not? 
(3) The new orals did not correlate very highly with other evalua- 
tive techniques? (4) Different teams of examiners used different 
standards (some were easier than ‘others) on the Jj fat lent Interv iews. 
This was also true of the traditional oral examinations. 



This study raised some questions about the validity and 
reliability of oral techniques which could not be answered from 
the data available . A fourth study was therefore inaugurated in 
the fall of 1965 to answer these questions. In this study the 
Patient Interviews and the traditional Adult Orthopaedic 
examination were administered to 236 residents at all levels of 
training in five different areas in the United States, Provision 
\ : :,- ; was made to obtain reliability data not only on two raters review-. 

i\.I. ,€ng the same examination but also on two examiners asking different 
sets of questions using the same type of examination. These data 
'are still, being processed. Preliminary review of some data indi«* ' 
cates that both the new and the traditional oral examinat ions are 
much, less reliable than written examinations, 

IV*,; Evmmwzm BOTH THE HEW* AND TEADITIOHAL ORAL ByAMXNATIOi>TS 

0 

Evaluation specialists have developed certain criteria for 
evaluating testing techniques. The results of these studios can 
:||;S|ist he interpreted and applied if they are reviewed in terms 
of the five criteria si; { 1)'- relevance , (2): validity, (3) relia- 1 
||i:ll||4llty f '' ;-(4) efficiency, and (5) affect on the educational pro- 

gram, Each of these is discussed below? 
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' A. Relevance 
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A good examination is one which samples areas of com- 
petence directly related to the purposes of the examination. 
/Since the purpose of the Orthopaedic Board examination is **q 7.: 
certify competency in the practice of orthopaedics, the o :; L|} 4 | 
examinations should sample these areas of conpe tence* The 
’new oral examinations were 'especially designed to sample / 
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>ESCRXPTXOH OF RESEARCH 



ere have been four studies so far which have yielded ') ! 

important information which bears upon zh® oral examinations *■' ; 

conducted by the American Board of Orthopaedic Surgery*' 

i ’ Th® first of these was the aforementioned critical incident 

study. This study yielded a behavioral description of competence s 
in orthopaedic surgery which covered 94 categories. (See Document! J 
• The examination committee of the Board then used these descriptions J 
as a guide in developing a set of tentative examination specifi- 
cations or blue print which would serve as guide to the develop- 
ment of future examinations, (See, Document II.) The importance 
of this study to the oral examinations is that certain behaviors 
such as 11 ability to relate to patients 11 and 11 ability to relate t 
colleagues” which are identified as components of effective, per 
formanea by the* critical incident study can most readily be 
assessed by some type of oral examination. 



The second research study was an analysis of idle traditional 
oral examinations used by the Board. In January 1965 a research 
t,eam consisting of five physicians and three educators observed 
.144 of the 2 #000. individual examinations conducted by the Board. 

'The team recorded 6,868 questions during their observations and 
categorised the type of competence being assessed by each. The . 
data indicated that these questions for the inbst parl| ; measured 
■ ; ability to recall, (rapidly; and under stress 
Y Isolated fragments of information. 

As a result of these studies, the project staff developed; 
three new types of orals which attempt to measure*-: some areas 
of competence listed in the critical incident study^which were not; ; 
being measured by other techniques. These examinations were? 

(!) Th e Simulated Patient Diagnostic Intervie w which requires a 
h candidate to play” the “role of physician and elicit information 
. from an examiner who plays the role of a patient during the history 
fl taking session and who also provides information on the physical 
;l||aminatiOn and laboratory studies. At the end of this information 
.'•V'sess Ion ■ the. examinee must explain his diagnostic impressions to 
the examiner or examiners * . (2 ) , The Btml ated^ 

■interview which requires a candidate as a simulated physician to 
explain a proposed treatment to the examiner as a simulated pacieno 

11 1||, The Simulat ed Pa tient Management Conference in which a group 

of five candidates simulate a conference at which they discuss 
the treatment; of, several cases.; ■ Xn order to admin£s,ter these 
examinations successfully, . approximately 35 examiners were brought 
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such areas so it would be hard to question their relevance, 
although some might claim that the areas of competence 
sampled by the new orals were less important than soma other 



If the content analysis of the traditional orals is 
accurate , then much of the material covered by these examirn 
irons is open to the charge of lack of relevance since it l 
often been found that recall of isolated facts may have little 
relationship with the ability to use information to solve 
problems. It is probable that some oral examiners use ques*~ 
tions which are much more relevant than others. 



It is riot sufficient to state that a test measur- • a 
particular type of ability, There must be evidence i,- - it 
does so, it is not always easy to obtain such evi den 



* 

**# «<W *4# 



One way to approach the problem of validity is 1 •' 
the content of the test coincides with the test’s st, - d purpose 
The new oral tests simulate typical situations whir : requi the 
abilities evaluated by the test. It may be, however# that the 
simulation is not close Enough to the situation being simulated 
to provide adequate information on the candidate’s competence. 
The traditional orals were designed to gain some information 
on the candidate’s understanding of various important concepts 
in orthopaedics, And most of the questions are" selected with 
that purpose in mind, The possibility exists, however, that the 
evaluate recall rather than understanding, 

Another way to test the validity of a test is to see if 
candidates perform on the test the way they perform in actual 
.situations when observed over a long period of time by compe- 
tent observers. Such information is now being gathered# . It 
must be emphasised that* there are some sire as of competence 
evaluated on tests that ■ observers seldom have the 

opportunity to- evaluate, 

A third way to measure test validity is to see if the 
test results agree with some hypothesis concerning the areas 
of competence being measured. For example, residents beginnings 
their training should do less well than residents ending- tlj ;;|r 
training;, . BU'Oh ; ;data. is now .being gathered, and. it is hoped- ' 
that a preliminary report can be presented to the Task Force . 



C , He 1 i ability 

* 

'Reliability has to do with the consistency which a 
particular test measures a particular factor, $h© reliabil.lt N 
of a test is greatly affected by errors caused by disagreements 
among raters and by lack of adequate sampling of the behavior 
' bsrng evaluated. Oral examinations are inherently less reliable 

, * than objective written tests, and there is evidence that both the 

new and traditional orals are relatively unreliable. While this ' 
unreliability does not obviate the use of such tests, it does 
• indicate tnat they are best used in combination with other mea- 
sures of competence as part of test- batteries, it also indicates 

the importance of training examiners and standardising case 
materials and procedures to minimize this unreliability, 

B# Efficiency •* 



It is obviously impossible to test all areas of competence 
with complete reliability. It is therefore important to use 
resources wisely and to avoid concentrating all the testing 
: in one area while leaving other areas unsampled. 

The content analysis seems to indicate that this is the most 
serious failing of the traditional orals, fhay seem to duplicate 
the same factors as the multiple choice examinations,. Whe new 
orals are an attempt to measure some additional areas .of 
competence not evaluated by other examinations, ; 

B* Effect on the Educational Program 



Examinations have an obvious effect on the attitudes of 
students and. on the things they study. If an examination covers 
most of the important areas of competence , the student*© 
attention will be directed toward these areas. While it is 
desirable to focus the attention of residents on content to 
facilitate their review of the important concepts in a field,, 
too much emphasis on content without corresponding attention 
:to process will encourage rote memorisation, kittle attention . 
will be paid to understanding or application* and less to the 
non-cognitive aspects of effective' performance. 






SUGGESTIONS BY THE STAFF OF THE ORTHOPAEDIC TRAINING STUDY 

As a result of the studios and analysis above, the 
following suggestions were -made by the staff of the Ortho- 
paedic Training Study* 



Standard procedures for all oral examinations should be 
adopted. This implies the development of detailed ins true- 
. tions for conducting the examination, and standardized rating 
forms and case materials for the conventional examinations 

analogous to those used for the new orals, 

* . * 

Systematic training programs for oral examiners and 
question authors should be instituted to insure continual 
updating of the question pool and renewal of the pool of 
experienced examiners for both' oral and written tests. 

Consideration should be given to alternative methods 
for establishing passing grades at the requisite level of 
competence rather than by a “curve" or, some other purely 
statistical criterion. This may very well require reporting 
and judging in terms of profiles of performance rather than , 
simple summations or averages. The final judgment on this 
point will be made by the Task Force on Scoring but this T^k 
Force should be aware of the possibility, l ' 



In addition to the above suggestions the Center has continued its 
research into oral examining techniques* The following new techniques 
have been explored for 'consideration for introduction into the oral 
examining process. 



A* An oral examination of the cognitive aspects of surgical 

skill. A copy of a tentative rating form and description of 
the examination is enclosed. (Bee Document XIX* ) 

: j| 1 An oral examination focusing on the* ability of an examinee 

v , . and defend therapy.. A' copy of the rating form and 
description of the examination is enclosed., (See Document 

An oral examination used to analyze the ability of the 
■ examinees to observe' v£.iual:.itt&€^ 

iri* : ;iO0KING TO THE FUTURE 

The suggestions made earlier are relatively easy to impl|'' ih. 

■ .stent within the present structure of the oral examinations, )t£.. 
v spay be, however, that the' whole, present '.organization of oral 
/■' examinations might need to multiple choice , 

examination' is well designed to measure a wide sample of knowledge 



